Web Page to Single-Page PDF & HTML (Automation-Ready)
Pricing
from $10.00 / 1,000 results
Web Page to Single-Page PDF & HTML (Automation-Ready)
Convert webpages to single-page PDFs and extract raw HTML via API. Captures full scroll height (no A4 splits). Built for automation with n8n, Make, and Zapier. Ideal for archiving, AI workflows, compliance, and bulk processing.
Pricing
from $10.00 / 1,000 results
Rating
5.0
(1)
Developer

Gavin Campbell
Actor stats
1
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
Web Page to Single-Page PDF Converter (Automation Ready)
Capture full-length webpages as single-page PDFs and extract raw HTML source code via API.
Designed for seamless integration with automation platforms like n8n, Make.com, and Zapier, this Apify Actor allows you to programmatically archive web content, generate visual reports, and feed clean data into your AI workflows.
Unlike standard converters that cut pages into A4 sheets, this tool captures the entire scrollable area of a webpage into one continuous PDF file, ensuring no data is cut off at page breaks.
🚀 Key Features
- Single-Page "Long" PDFs: Captures the full height of the webpage in a single continuous document. Perfect for newsletters, landing pages, and social media feeds.
- HTML Source Extraction: Option to save the exact
view-source:HTML code alongside the visual PDF. - Bulk Processing: Handle thousands of URLs in a single run.
- Anti-Blocking: Built-in support for Apify Proxy and stealth mode to bypass bot detection.
- Smart Waiting: Configurable
waitUntilstrategies (e.g.,networkidle0) ensure dynamic JavaScript content loads completely before capture.
💡 Use Cases
- Compliance & Archiving: Automatically screenshot and save the HTML source of your legal pages, T&Cs, or partner sites for compliance auditing.
- Marketing Swipe Files: Build a visual database of competitor landing pages, emails, and ad creatives.
- AI Knowledge Base: Feed the raw HTML output into LLMs (like ChatGPT or Claude) via n8n to analyze page structure or content without parsing complex DOMs yourself.
- Invoicing & Receipts: Convert web-based invoice views into portable PDF files for accounting systems.
- Design QA: Automate visual regression testing by capturing full-page renders of your staging environment.
⚙️ Input Configuration
| Field | Type | Default | Description |
|---|---|---|---|
startUrls | Array | [] | A list of URLs you want to convert. Supports direct URLs or object format. |
saveHtml | Boolean | true | If enabled, saves the raw HTML source code (.html) to the Key-Value store. |
proxyConfiguration | Object | Apify Proxy | Recommended to keep enabled to avoid IP bans. |
waitUntil | String | networkidle0 | When to take the snapshot. Use networkidle0 for strict loading or domcontentloaded for speed. |
🔌 Automation Integrations
This Actor is built to be a backend microservice. Here is how to connect it to your favorite workflow automation tools.
1. n8n Integration
Goal: Trigger the actor from a workflow and download the resulting PDF.
- Add the "Apify" Node: In your n8n workflow, add the Apify node.
- Select Action: Choose Run Actor.
- Actor ID: Search for
web-to-pdf-converter(or use the Actor ID from the Apify console). - Input: switch to JSON mode and map your URL:
{"startUrls": [{ "url": "{{$json.your_url_field}}" }],"saveHtml": true}
- Wait for Finish: Ensure the "Synchronous" option is checked (or use a separate "Wait" node and "Get Dataset Items" node for long runs).
- Retrieve Files: The output will contain a
pdfUrl. Use an HTTP Request node to GET that URL and save the binary data.
2. Make.com (Integromat) Integration
Goal: Save a webpage to Google Drive every time a new row is added to Google Sheets.
- Trigger: Google Sheets (Watch Rows).
- Action: Add the Apify module -> Run Actor.
- Settings:
- Actor: Select this actor.
- Body:
{"startUrls": [{ "url": "{{1.url}}" }],"saveHtml": true}
- Action: Add Apify module -> Get Dataset Items.
- Dataset ID: Map the
defaultDatasetIdfrom the previous step.
- Dataset ID: Map the
- Action: Add HTTP module -> Get a file.
- URL: Map the
pdfUrlfrom the dataset items.
- URL: Map the
- Action: Google Drive -> Upload a File.
3. Zapier Integration
Goal: Email a PDF version of a webpage when a specific event occurs.
- Trigger: Any Zapier trigger (e.g., "New Trello Card").
- Action: Search for Apify.
- Event: Select Run Actor.
- Configure:
- Actor: Paste the Actor ID.
- Input Body:
{"startUrls": [{ "url": "https://example.com" }]}
- Action: Select Apify -> Get Dataset Items (to get the PDF link).
- Action: Gmail -> Send Email. Use the
pdfUrlin the attachment field or body.
📦 Output Format
The actor stores results in two locations:
- Key-Value Store: The physical files.
Page_Title_hash.pdf(The visual render)Page_Title_hash_source.html(The source code)
- Dataset: The JSON metadata used for linking.
Sample Dataset JSON:
{"url": "https://apify.com","title": "Apify: The Web Scraping and Automation Platform","pdfUrl": "https://api.apify.com/v2/key-value-stores/mYStoReId/records/Apify_hash.pdf","htmlUrl": "https://api.apify.com/v2/key-value-stores/mYStoReId/records/Apify_hash_source.html","timestamp": "2023-10-27T14:30:00.000Z"}
🛠 Troubleshooting
- PDF is blank/white: Try changing
waitUntiltonetworkidle0. This forces the crawler to wait until all network activity (images, scripts) has settled. - Cookie Consent Popups: The actor attempts to hide scrollbars, but popups may obscure content. For complex sites, you may need an actor with custom "click" logic or use a pre-navigation hook (advanced usage).
- Access Denied: Ensure you are using the
proxyConfigurationset touseApifyProxy: trueto avoid 403 errors.
Built with ❤️ using the Apify SDK and Puppeteer.