Pricing

$7.00/month + usage

Download HTML from URLs

This script with an Apify Actor to fetch the complete HTML source of any website. The user provides a URL, the page is loaded with JavaScript execution, the full HTML is printed in the terminal, saved to an HTML file,

Pricing

$7.00/month + usage

Rating

0.0

(0)

Developer

Data Pilot

Actor stats

Bookmarked

Total users

Monthly active users

20 days ago

Last modified

🔥 Features

Full HTML Capture – Downloads complete HTML source code from any URL, including JavaScript-rendered content.
Browser Automation – Uses Playwright for headless browser navigation to access fully loaded HTML pages.
Screenshot Generation – Captures page previews as PNG images for visual HTML analysis.
Dynamic Content Handling – Waits for page loads and JavaScript execution to ensure complete HTML extraction.
Secure Storage – Saves HTML files and screenshots in Apify's Key-Value Store for easy access.
Single URL Processing – Processes one URL per run for precise HTML results.
Error Handling – Robust logging for issues during HTML downloading.
Dataset Integration – Automatically uploads HTML metadata to your Apify dataset for easy export and analysis.

⚙️ How It Works

The Download HTML from URLs takes a target URL as input and uses Playwright to launch a headless browser, navigate to the URL, and wait for the page to fully load. It captures the full HTML source and takes a screenshot, then saves the HTML and image files to Apify's Key-Value Store while pushing metadata to the dataset. This approach ensures reliable, complete HTML extraction from any web page.

Key Processing Steps:

URL Validation – Parse and validate the provided URL
Browser Launch – Initialize headless Playwright browser
Page Navigation – Navigate to the target URL
Content Loading – Wait for full page load and JavaScript execution
HTML Capture – Extract complete HTML source code
Screenshot – Capture page preview as PNG image
Secure Storage – Save files to Key-Value Store
Data Export – Push metadata to dataset

Key benefits for HTML analysis:

Access full HTML source for web scraping and research.
Capture dynamic HTML content that requires JavaScript.
Build HTML archives for content preservation.
Analyze page structure and metadata.
Extract rendered HTML content accurately.

📥 Input

The downloader accepts the following input parameters:

Field	Type	Default	Description
`url`	string	required	The URL to download HTML from (e.g., `"https://www.example.com"`).

Example input JSON:

{
  "url": "https://www.example.com"
}

📤 Output

The downloader outputs HTML metadata in JSON format. Each record includes:

Field	Type	Description
`url`	string	The original URL provided.
`title`	string	Title of the web page.
`html_content`	string	Full HTML source code of the page.
`html_file_url`	string	URL to access FULL_PAGE_SOURCE.html from Key-Value Store.
`screenshot_url`	string	URL to access PAGE_PREVIEW.png screenshot from Key-Value Store.
`page_size`	integer	Size of HTML file in bytes.
`captured_at`	string	ISO timestamp of the HTML capture.
`status`	string	Status of capture (`"Success"` or `"Failed"`).

Example output for successful HTML download:

{
  "url": "https://www.example.com",
  "title": "Example Page Title",
  "html_content": "<!DOCTYPE html><html><head><title>Example Page Title</title>...</head><body>...</body></html>",
  "html_file_url": "https://api.apify.com/v2/key-value-stores/store_id/records/FULL_PAGE_SOURCE.html",
  "screenshot_url": "https://api.apify.com/v2/key-value-stores/store_id/records/PAGE_PREVIEW.png",
  "page_size": 45823,
  "captured_at": "2025-02-14T12:00:00Z",
  "status": "Success"
}

Example error response:

{
  "url": "https://www.example.com/invalid",
  "status": "Failed",
  "error": "Page not accessible or timeout occurred",
  "captured_at": "2025-02-14T12:00:00Z"
}

🧰 Technical Stack

Browser Automation: Playwright – Fast, headless browser automation
Page Interaction: DOM manipulation and rendering
Screenshot Capture: PNG image generation
Storage: Apify Key-Value Store for secure file storage
Platform: Apify Actor – serverless, scalable, integrated with Dataset and Key‑Value Store
Deployment: One‑click run on Apify Console or via REST API

🎯 Use Cases

Web Scraping Research – Extract HTML source for web scraping projects and analysis.
Web Archiving – Archive web pages for historical preservation and reference.
Content Migration – Capture HTML for migrating content to other platforms.
Competitor Analysis – Analyze competitor website structures and layouts.
SEO Analysis – Examine HTML structure for SEO optimization.
Web Page Backup – Create backups of important web pages.
Legal Compliance – Archive web pages for legal or compliance purposes.
Academic Research – Collect HTML data for research studies.
Website Testing – Test website HTML before and after changes.
Content Analysis – Analyze page content and structure.
Dynamic Content Capture – Extract JavaScript-rendered content.
Meta Data Extraction – Extract titles, descriptions, and metadata.
Accessibility Analysis – Analyze HTML for accessibility compliance.
Performance Research – Study website code and performance metrics.

🚀 Quick Start

Open in Apify Console – visit the Actor page and click Try for free.
Enter a URL – provide the URL of the web page you want to download HTML from.
Click Start – the Actor will launch browser, load the page, and capture HTML.
View Results – check the dataset for HTML metadata and storage links.
Download HTML – access the full HTML file from the provided URL.
View Screenshot – see the page preview screenshot.
Export – download the results as JSON, CSV, or Excel.

You can also call this Actor programmatically via Apify SDK or REST API – ideal for automated web scraping and archiving pipelines.

💎 Why This Tool?

Feature	Benefit
✅ Full HTML capture	Get complete page source including dynamic content.
✅ Browser automation	Handle JavaScript-rendered pages accurately.
✅ Screenshots included	Get visual preview of captured pages.
✅ Secure storage	Files stored in cloud-based Key-Value Store.
✅ Metadata extraction	Get page titles and structure information.
✅ Error handling	Robust fallback mechanisms for reliability.
✅ Fast processing	Get results quickly with optimized browser settings.
✅ Apify ecosystem	Seamless integration with other Actors, triggers, and webhooks.

📦 Changelog

Initial release of Download HTML from URLs
Full HTML source code capture from any URL
Browser automation with Playwright for dynamic content
Screenshot generation for visual preview
Secure Key-Value Store storage for HTML and images
Complete metadata extraction (titles, page info)
Error handling with detailed error messages
Automatic dataset integration
Full Apify Actor integration

🧑‍💻 Support & Feedback

Issues & Ideas: Open a ticket on the Apify Actor issue tracker
Documentation: Visit Apify Docs for comprehensive platform guides
Community: Join the Apify community forum for discussions and support
Bug Reports: Submit detailed bug reports through the issue tracker
Feature Requests: Suggest new features to improve the tool

💰 Pricing

Free for basic usage on Apify platform
Paid plans available for higher limits and priority support
Compute credits consumed for browser automation and processing
Storage credits consumed for HTML and screenshot storage

Disclaimer: Download HTML from URLs is provided as-is for research and archiving purposes. Users are responsible for ensuring their usage complies with website policies and applicable laws. Browser automation may consume significant compute resources.

🎉 Get Started Today

Begin downloading HTML from web pages now!

Use Download HTML from URLs for:

🔍 Web Scraping Research
📚 Content Archiving
🕷️ Web Analysis
📊 Competitive Research
🔐 Legal Compliance

Perfect for:

Web Developers
Researchers
Data Analysts
SEO Specialists
Content Archivists

Status: Active Development
Support: 24/7 Customer Support Available
Platform: Apify

For comprehensive web analysis and content management, explore our full suite of tools:

All-in-One Media Downloader
Ultimate Video Info Fetcher
Google Search Results Scraper
Pinterest Pin Video Downloader
Web Content Analysis Tools

Download HTML from URLs

mtrunkat/url-list-download-html

This actor takes a list of URLs and downloads HTML of each page.

Marek Trunkát

Download HTML from URLs

scrapeai/html-downloader

This actor takes a list of URLs and downloads HTML of each page.

ScrapeAI

5.0

vidmate vedio downloader

datapilot/vidmate-vedio-downloader

YouTube Direct Link Fetcher is an Apify Actor that uses to extract video metadata and a direct MP4 link (360p or best ≤360p) without downloading. It supports proxy configuration, processes multiple URLs concurrently, and outputs title, uploader, and stream URL in structured JSON.

Data Pilot

Facebook Ads Library Scraper

datapilot/facebook-ads-library-scraper

This script uses to scrape Facebook Ads data from the Facebook Ads Library based on a user-provided keyword. It captures ads in real-time as the page loads and scrolls, then saves them into a JSON file.

Data Pilot

TikTok User Search Scraper

datapilot/tiktok-user-search-scraper

This Apify TikTok User Generator creates TikTok user profiles based on a search query. It generates usernames, avatars, bios, follower counts, video stats, and bio links, then pushes all user data to the Apify Dataset using a residential proxy.

Data Pilot

TikTok Comments Scraper

datapilot/tiktok-comments-scraper

This Apify TikTok Comments Generator takes a TikTok video URL, extracts the video ID, then generates 50 realistic-looking comments with usernames, avatars, likes, replies, and timestamps. All comments follow TikTok’s structure and are pushed to the Apify Dataset using a residential proxy.

Data Pilot

Pinterest pin Vedio Downloader

datapilot/pinterest-pin-vedio-downloader

Quickly extract metadata and playable streams from Pinterest pins. Get direct video and audio URLs, thumbnails, duration, and titles. Safe for Marketplace use—no copyright bypass or forced downloads, perfect for developers and social media tools.

Data Pilot

TikTok Scraper

datapilot/tiktok-scraper

Just provide the TikTok video link. All video data — views, likes, comments, shares, description, author details — will be collected and stored directly in your Apify dataset. Fast, accurate, and simple — fully optimized for extracting data from TikTok.

Data Pilot

YouTube Channel ID Extractor Pro

datapilot/youtube-channel-id-extractor-pro

This Apify Actor extracts YouTube channel IDs from multiple channel URL formats (direct /channel/, @username, /c/, and /user/). It intelligently resolves non-direct URLs using inputs, processes multiple URLs, and outputs results with success/failure status to the Apify Dataset along

Data Pilot

Twitter (X) Video Downloader

datapilot/twitter-x-video-downloader

Extracts direct MP4 video URL and metadata (title, channel, duration, thumbnail) from Twitter/X video links using Returns downloadable stream link without downloading the file. Ideal for fast video info fetching and integration in automation workflows.

Data Pilot