Download HTML from URLs avatar
Download HTML from URLs

Pricing

$5.00/month + usage

Go to Apify Store
Download HTML from URLs

Download HTML from URLs

This actor takes a list of URLs and downloads HTML of each page.

Pricing

$5.00/month + usage

Rating

5.0

(4)

Developer

ScrapeAI

ScrapeAI

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

3

Monthly active users

10 days ago

Last modified

Categories

Share

🧠 HTML Downloader

This Apify actor takes a list of URLs and downloads the full HTML content of each page. It simply scrapes the complete HTML code for all given URLs. You can define proxy settings and optional selector waiting.

✅ Use Cases

📄 Download HTML content from multiple websites

🕷️ Archive web pages for offline analysis

📊 Extract raw HTML for custom parsing

🔍 Monitor website changes over time

📥 Input Configuration

You can customize the actor using the following input fields:

{
"requestListSources": [
{
"url": "https://apify.com"
}
],
"proxyConfiguration": {
"useApifyProxy": true
},
"handlePageTimeoutSecs": 60,
"maxRequestRetries": 1,
"useChrome": false
}

🧾 Fields Explained Field Type Description requestListSources array Required. Array of URLs to download. Each item can have optional userData with waitForSelector proxyConfiguration object Proxy settings - choose no proxy, Apify Proxy, or custom proxy URLs handlePageTimeoutSecs integer Optional. Maximum time to spend processing one page (default: 60) maxRequestRetries integer Optional. How many retries before giving up (default: 1) useChrome boolean Optional. Use real Chrome browser instead of Chromium (default: false)

📤 Output

The actor returns a dataset containing HTML content for each URL. Each record includes the original URL, final URL (after redirects), page title, and full HTML content.

🧩 Sample Output

[
{
"url": "https://apify.com",
"loadedUrl": "https://apify.com/",
"title": "Apify - Web Scraping & Data Extraction | Apify",
"html": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n<meta charset=\"utf-8\">\n..."
}
]

🔒 Proxy Configuration

This actor supports flexible proxy configuration:

No proxy (default)

Apify Proxy for residential IPs

Custom proxy URLs

Default proxy settings:

{
"useApifyProxy": true
}

🚀 How to Use

Open the actor in Apify Console

Click "Try actor" or create a new task

Add URLs to the requestListSources array

Configure proxy settings if needed

Run the actor

Download HTML content in JSON, CSV, or XML format

⚙️ Advanced Input Example

{
"requestListSources": [
{
"url": "https://example.com",
"userData": {
"waitForSelector": ".content-loaded"
}
},
{
"url": "https://another-site.com"
}
],
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
},
"handlePageTimeoutSecs": 120,
"maxRequestRetries": 3,
"useChrome": true
}

🛠️ Tech Stack

🧩 Apify SDK — for actor and data handling

🕷️ Crawlee — for robust crawling and scraping

🌐 Puppeteer — for browser automation and rendering dynamic content

⚙️ Node.js — fast, scalable backend environment