๐ Download HTML from URLs
Pricing
from $2.99 / 1,000 results
๐ Download HTML from URLs
๐ Download HTML from URLs instantly. Scrape & archive raw page source for analysis, monitoring, or data pipelines. ๐ Supports automation, fast fetching, and reliable extraction. Perfect for developers, SEO, and research workflows.
Pricing
from $2.99 / 1,000 results
Rating
0.0
(0)
Developer
SimpleAPI
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Download the complete HTML of any list of web pages โ fast, reliable, and at scale. Paste your URLs (bulk supported), press Start, and get the full page source of every URL saved neatly into a dataset you can export as JSON, CSV, or Excel.
The actor uses a smart two-stage download engine: a lightning-fast HTTP fetch first, and an automatic real-browser rendering fallback for pages that need JavaScript or block simple clients. Blocked requests trigger an automatic proxy escalation ladder (direct โ datacenter โ residential), so you get your HTML even from protected sites.
โจ Why Choose Us?
- ๐ Fast by default โ direct connections and parallel downloads, no wasted proxy traffic.
- ๐ก๏ธ Self-healing on blocks โ automatically falls back to datacenter and then residential proxies, and keeps the stronger proxy for the rest of the run.
- ๐ญ Browser rendering fallback โ pages that fail plain HTTP get rendered in a real headless Chromium browser.
- ๐พ Live results โ every page is saved to the dataset the moment it finishes, so even interrupted runs keep their data.
- ๐ Flexible input โ URLs with or without
https://, bulk paste, file upload, or Google Sheets.
๐ Key Features
- Bulk URL input (
requestListSourceseditor โ paste lists, upload files, link sheets) - Full page HTML (
fullHtml) and extracted<body>HTML (html) per page - Automatic retries with exponential backoff (configurable)
- Configurable concurrency, page timeout, and polite request delays
- Detailed per-URL debug info (
#debug) โ status code, retries, error messages, proxy tier
๐ฅ Input
{"startUrls": [{ "url": "https://apify.com" },{ "url": "example.com" }],"proxyConfiguration": { "useApifyProxy": false },"pageTimeoutSecs": 60,"maxRetries": 3,"maxConcurrency": 5,"requestDelaySecs": 0}
| Field | Type | Default | Description |
|---|---|---|---|
startUrls | array | โ | Required. List of URLs to download. Missing schemes default to https://. |
proxyConfiguration | object | no proxy | Proxy settings. By default requests go direct; on blocks the actor escalates to datacenter โ residential automatically. |
pageTimeoutSecs | integer | 60 | Max seconds to spend downloading one page. |
maxRetries | integer | 3 | Extra attempts for a failing URL. |
maxConcurrency | integer | 5 | Pages downloaded in parallel. |
requestDelaySecs | number | 0 | Optional polite delay before each request (jitter added). |
๐ค Output
One dataset record per URL:
{"url": "https://apify.com","finishedAt": "2026-06-10T10:14:29.693Z","fullHtml": "<!DOCTYPE html><html>...</html>","html": "<body>...</body>"}
| Field | Description |
|---|---|
url | The downloaded URL. |
finishedAt | UTC timestamp when the page finished downloading. |
fullHtml | Complete HTML source of the page. |
html | Just the <body>...</body> portion. |
#debug | Hidden field with status code, retry count, error messages, and the proxy tier used. |
#error | Hidden boolean โ true if no HTML could be retrieved. |
๐ How to Use (Apify Console)
- Log in at console.apify.com โ Actors.
- Open Download HTML from URLs.
- Paste your URLs into ๐ Website URLs (bulk paste works!).
- Optionally tweak proxy, timeout, retries, and concurrency.
- Click Start and watch the live progress logs.
- Open the Output tab when the run completes.
- Export to JSON / CSV / XLSX with one click.
๐ค Use via API
curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/run-sync-get-dataset-items?token=$APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"startUrls":[{"url":"https://apify.com"}]}'
๐ผ Best Use Cases
- ๐ฐ Archiving article or product pages
- ๐ Feeding HTML into your own parsers / LLM pipelines
- ๐งช Monitoring page content and structure changes
- ๐๏ธ Bulk snapshotting of competitor or partner sites
- ๐ค Pre-fetching pages for downstream AI extraction Actors
๐ฐ Pricing
This actor uses the pay-per-event model with one simple event:
| Event | Charged when |
|---|---|
page-downloaded | A page's HTML is successfully downloaded and saved to the dataset. |
Failed URLs are never charged. When your spending limit is reached, the run stops gracefully and keeps everything collected so far.
โ Frequently Asked Questions
Do I need a proxy? Usually not โ the actor starts with direct connections. If a site blocks the request, it escalates to datacenter and then residential proxies automatically.
Does it render JavaScript? Yes, when needed. Pages that fail the fast HTTP download are automatically rendered in a real headless browser.
Can I paste URLs without https://?
Yes โ example.com is automatically converted to https://example.com.
What happens to URLs that fail completely?
They're still saved to the dataset with an empty fullHtml and full error details in #debug, so you always know exactly what happened โ and you're not charged for them.
โ๏ธ Legal
This actor downloads only publicly available web pages. You are responsible for complying with the target websites' terms of service and applicable laws (GDPR, CCPA, etc.) when using the downloaded data.
๐ฌ Support and Feedback
Found a bug or need a feature? Open an issue on the actor's Issues tab in Apify Console โ we respond quickly!