Download HTML from URLs
Pricing
$5.00/month + usage
🧠 HTML Downloader
This Apify actor takes a list of URLs and downloads the full HTML content of each page. It simply scrapes the complete HTML code for all given URLs. You can define proxy settings and optional selector waiting.
✅ Use Cases
📄 Download HTML content from multiple websites
🕷️ Archive web pages for offline analysis
📊 Extract raw HTML for custom parsing
🔍 Monitor website changes over time
📥 Input Configuration
You can customize the actor using the following input fields:
{"requestListSources": [{"url": "https://apify.com"}],"proxyConfiguration": {"useApifyProxy": true},"handlePageTimeoutSecs": 60,"maxRequestRetries": 1,"useChrome": false}
🧾 Fields Explained Field Type Description requestListSources array Required. Array of URLs to download. Each item can have optional userData with waitForSelector proxyConfiguration object Proxy settings - choose no proxy, Apify Proxy, or custom proxy URLs handlePageTimeoutSecs integer Optional. Maximum time to spend processing one page (default: 60) maxRequestRetries integer Optional. How many retries before giving up (default: 1) useChrome boolean Optional. Use real Chrome browser instead of Chromium (default: false)
📤 Output
The actor returns a dataset containing HTML content for each URL. Each record includes the original URL, final URL (after redirects), page title, and full HTML content.
🧩 Sample Output
[{"url": "https://apify.com","loadedUrl": "https://apify.com/","title": "Apify - Web Scraping & Data Extraction | Apify","html": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n<meta charset=\"utf-8\">\n..."}]
🔒 Proxy Configuration
This actor supports flexible proxy configuration:
No proxy (default)
Apify Proxy for residential IPs
Custom proxy URLs
Default proxy settings:
{"useApifyProxy": true}
🚀 How to Use
Open the actor in Apify Console
Click "Try actor" or create a new task
Add URLs to the requestListSources array
Configure proxy settings if needed
Run the actor
Download HTML content in JSON, CSV, or XML format
⚙️ Advanced Input Example
{"requestListSources": [{"url": "https://example.com","userData": {"waitForSelector": ".content-loaded"}},{"url": "https://another-site.com"}],"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]},"handlePageTimeoutSecs": 120,"maxRequestRetries": 3,"useChrome": true}
🛠️ Tech Stack
🧩 Apify SDK — for actor and data handling
🕷️ Crawlee — for robust crawling and scraping
🌐 Puppeteer — for browser automation and rendering dynamic content
⚙️ Node.js — fast, scalable backend environment
