Links To Pdfs OpenClaw Skill - ClawHub
Do you want your AI agent to automate Links To Pdfs workflows? This free skill from ClawHub helps with pdf & documents tasks without building custom tools from scratch.
What this skill does
Scrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows for protected documents and session persistence via profiles. Returns local file paths to downloaded PDFs.
Install
npx clawhub@latest install links-to-pdfsFull SKILL.md
Open original| name | description |
|---|---|
| scraper | Scrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows for protected documents and session persistence via profiles. Returns local file paths to downloaded PDFs. |
docs-scraper
CLI tool that scrapes documents from various sources into local PDF files using browser automation.
Installation
npm install -g docs-scraper
Quick start
Scrape any document URL to PDF:
docs-scraper scrape https://example.com/document
Returns local path: ~/.docs-scraper/output/1706123456-abc123.pdf
Basic scraping
Scrape with daemon (recommended, keeps browser warm):
docs-scraper scrape <url>
Scrape with named profile (for authenticated sites):
docs-scraper scrape <url> -p <profile-name>
Scrape with pre-filled data (e.g., email for DocSend):
docs-scraper scrape <url> -D [email protected]
Direct mode (single-shot, no daemon):
docs-scraper scrape <url> --no-daemon
Authentication workflow
When a document requires authentication (login, email verification, passcode):
-
Initial scrape returns a job ID:
docs-scraper scrape https://docsend.com/view/xxx # Output: Scrape blocked # Job ID: abc123 -
Retry with data:
docs-scraper update abc123 -D [email protected] # or with password docs-scraper update abc123 -D [email protected] -D password=1234
Profile management
Profiles store session cookies for authenticated sites.
docs-scraper profiles list # List saved profiles
docs-scraper profiles clear # Clear all profiles
docs-scraper scrape <url> -p myprofile # Use a profile
Daemon management
The daemon keeps browser instances warm for faster scraping.
docs-scraper daemon status # Check status
docs-scraper daemon start # Start manually
docs-scraper daemon stop # Stop daemon
Note: Daemon auto-starts when running scrape commands.
Cleanup
PDFs are stored in ~/.docs-scraper/output/. The daemon automatically cleans up files older than 1 hour.
Manual cleanup:
docs-scraper cleanup # Delete all PDFs
docs-scraper cleanup --older-than 1h # Delete PDFs older than 1 hour
Job management
docs-scraper jobs list # List blocked jobs awaiting auth
Supported sources
- Direct PDF links - Downloads PDF directly
- Notion pages - Exports Notion page to PDF
- DocSend documents - Handles DocSend viewer
- LLM fallback - Uses Claude API for any other webpage
Scraper Reference
Each scraper accepts specific -D data fields. Use the appropriate fields based on the URL type.
DirectPdfScraper
Handles: URLs ending in .pdf
Data fields: None (downloads directly)
Example:
docs-scraper scrape https://example.com/document.pdf
DocsendScraper
Handles: docsend.com/view/*, docsend.com/v/*, and subdomains (e.g., org-a.docsend.com)
URL patterns:
- Documents:
https://docsend.com/view/{id}orhttps://docsend.com/v/{id} - Folders:
https://docsend.com/view/s/{id} - Subdomains:
https://{subdomain}.docsend.com/view/{id}
Data fields:
| Field | Type | Description |
|---|---|---|
email |
Email address for document access | |
password |
password | Passcode/password for protected documents |
name |
text | Your name (required for NDA-gated documents) |
Examples:
# Pre-fill email for DocSend
docs-scraper scrape https://docsend.com/view/abc123 -D [email protected]
# With password protection
docs-scraper scrape https://docsend.com/view/abc123 -D [email protected] -D password=secret123
# With NDA name requirement
docs-scraper scrape https://docsend.com/view/abc123 -D [email protected] -D name="John Doe"
# Retry blocked job
docs-scraper update abc123 -D [email protected] -D password=secret123
Notes:
- DocSend may require any combination of email, password, and name
- Folders are scraped as a table of contents PDF with document links
- The scraper auto-checks NDA checkboxes when name is provided
NotionScraper
Handles: notion.so/*, *.notion.site/*
Data fields:
| Field | Type | Description |
|---|---|---|
email |
Notion account email | |
password |
password | Notion account password |
Examples:
# Public page (no auth needed)
docs-scraper scrape https://notion.so/Public-Page-abc123
# Private page with login
docs-scraper scrape https://notion.so/Private-Page-abc123 \
-D [email protected] -D password=mypassword
# Custom domain
docs-scraper scrape https://docs.company.notion.site/Page-abc123
Notes:
- Public Notion pages don't require authentication
- Toggle blocks are automatically expanded before PDF generation
- Uses session profiles to persist login across scrapes
LlmFallbackScraper
Handles: Any URL not matched by other scrapers (automatic fallback)
Data fields: Dynamic - determined by Claude analyzing the page
The LLM scraper uses Claude to analyze the page HTML and detect:
- Login forms (extracts field names dynamically)
- Cookie banners (auto-dismisses)
- Expandable content (auto-expands)
- CAPTCHAs (reports as blocked)
- Paywalls (reports as blocked)
Common dynamic fields:
| Field | Type | Description |
|---|---|---|
email |
Login email (if detected) | |
password |
password | Login password (if detected) |
username |
text | Username (if login uses username) |
Examples:
# Generic webpage (no auth)
docs-scraper scrape https://example.com/article
# Webpage requiring login
docs-scraper scrape https://members.example.com/article \
-D [email protected] -D password=secret
# When blocked, check the job for required fields
docs-scraper jobs list
# Then retry with the fields the scraper detected
docs-scraper update abc123 -D username=myuser -D password=secret
Notes:
- Requires
ANTHROPIC_API_KEYenvironment variable - Field names are extracted from the page's actual form fields
- Limited to 2 login attempts before failing
- CAPTCHAs require manual intervention
Data field summary
| Scraper | password | name | Other | |
|---|---|---|---|---|
| DirectPdf | - | - | - | - |
| DocSend | ✓ | ✓ | ✓ | - |
| Notion | ✓ | ✓ | - | - |
| LLM Fallback | ✓* | ✓* | - | Dynamic* |
*Fields detected dynamically from page analysis
Environment setup (optional)
Only needed for LLM fallback scraper:
export ANTHROPIC_API_KEY=your_key
Optional browser settings:
export BROWSER_HEADLESS=true # Set false for debugging
Common patterns
Archive a Notion page:
docs-scraper scrape https://notion.so/My-Page-abc123
Download protected DocSend:
docs-scraper scrape https://docsend.com/view/xxx
# If blocked:
docs-scraper update <job-id> -D [email protected] -D password=1234
Batch scraping with profiles:
docs-scraper scrape https://site.com/doc1 -p mysite
docs-scraper scrape https://site.com/doc2 -p mysite
Output
Success: Local file path (e.g., ~/.docs-scraper/output/1706123456-abc123.pdf)
Blocked: Job ID + required credential types
Troubleshooting
- Timeout:
docs-scraper daemon stop && docs-scraper daemon start - Auth fails:
docs-scraper jobs listto check pending jobs - Disk full:
docs-scraper cleanupto remove old PDFs