Download HTML from URLs avatar

Download HTML from URLs

Pricing

$7.00/month + usage

Go to Apify Store
Download HTML from URLs

Download HTML from URLs

This script with an Apify Actor to fetch the complete HTML source of any website. The user provides a URL, the page is loaded with JavaScript execution, the full HTML is printed in the terminal, saved to an HTML file,

Pricing

$7.00/month + usage

Rating

0.0

(0)

Developer

Data Pilot

Data Pilot

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

2

Monthly active users

3 hours ago

Last modified

Categories

Share

πŸš€ Download HTML from URLs is a powerful Apify Actor designed to download HTML content from any URL using advanced browser automation. This tool captures the full HTML source code of web pages, including dynamically loaded content, and provides screenshots for visual reference. Whether you're conducting web scraping research, archiving web pages, or analyzing HTML structures, the Download HTML from URLs delivers comprehensive HTML data efficiently.

With browser-based automation, the Download HTML from URLs ensures accurate capture of rendered HTML that may not be available through simple HTTP requests. It focuses on key HTML metrics like full page source and titles, making it an essential tool for HTML analysis and web content extraction.

πŸ”₯ Features

  • Full HTML Capture – Downloads complete HTML source code from any URL, including JavaScript-rendered content.
  • Browser Automation – Uses Playwright for headless browser navigation to access fully loaded HTML pages.
  • Screenshot Generation – Captures page previews as PNG images for visual HTML analysis.
  • Dynamic Content Handling – Waits for page loads and JavaScript execution to ensure complete HTML extraction.
  • Secure Storage – Saves HTML files and screenshots in Apify's Key-Value Store for easy access.
  • Single URL Processing – Processes one URL per run for precise HTML results.
  • Error Handling – Robust logging for issues during HTML downloading.
  • Dataset Integration – Automatically uploads HTML metadata to your Apify dataset for easy export and analysis.

βš™οΈ How It Works

The Download HTML from URLs takes a target URL as input and uses Playwright to launch a headless browser, navigate to the URL, and wait for the page to fully load. It captures the full HTML source and takes a screenshot, then saves the HTML and image files to Apify's Key-Value Store while pushing metadata to the dataset. This approach ensures reliable, complete HTML extraction from any web page.

Key Processing Steps:

  1. URL Validation – Parse and validate the provided URL
  2. Browser Launch – Initialize headless Playwright browser
  3. Page Navigation – Navigate to the target URL
  4. Content Loading – Wait for full page load and JavaScript execution
  5. HTML Capture – Extract complete HTML source code
  6. Screenshot – Capture page preview as PNG image
  7. Secure Storage – Save files to Key-Value Store
  8. Data Export – Push metadata to dataset

Key benefits for HTML analysis:

  • Access full HTML source for web scraping and research.
  • Capture dynamic HTML content that requires JavaScript.
  • Build HTML archives for content preservation.
  • Analyze page structure and metadata.
  • Extract rendered HTML content accurately.

πŸ“₯ Input

The downloader accepts the following input parameters:

FieldTypeDefaultDescription
urlstringrequiredThe URL to download HTML from (e.g., "https://www.example.com").

Example input JSON:

{
"url": "https://www.example.com"
}

πŸ“€ Output

The downloader outputs HTML metadata in JSON format. Each record includes:

FieldTypeDescription
urlstringThe original URL provided.
titlestringTitle of the web page.
html_contentstringFull HTML source code of the page.
html_file_urlstringURL to access FULL_PAGE_SOURCE.html from Key-Value Store.
screenshot_urlstringURL to access PAGE_PREVIEW.png screenshot from Key-Value Store.
page_sizeintegerSize of HTML file in bytes.
captured_atstringISO timestamp of the HTML capture.
statusstringStatus of capture ("Success" or "Failed").

Example output for successful HTML download:

{
"url": "https://www.example.com",
"title": "Example Page Title",
"html_content": "<!DOCTYPE html><html><head><title>Example Page Title</title>...</head><body>...</body></html>",
"html_file_url": "https://api.apify.com/v2/key-value-stores/store_id/records/FULL_PAGE_SOURCE.html",
"screenshot_url": "https://api.apify.com/v2/key-value-stores/store_id/records/PAGE_PREVIEW.png",
"page_size": 45823,
"captured_at": "2025-02-14T12:00:00Z",
"status": "Success"
}

Example error response:

{
"url": "https://www.example.com/invalid",
"status": "Failed",
"error": "Page not accessible or timeout occurred",
"captured_at": "2025-02-14T12:00:00Z"
}

🧰 Technical Stack

  • Browser Automation: Playwright – Fast, headless browser automation
  • Page Interaction: DOM manipulation and rendering
  • Screenshot Capture: PNG image generation
  • Storage: Apify Key-Value Store for secure file storage
  • Platform: Apify Actor – serverless, scalable, integrated with Dataset and Key‑Value Store
  • Deployment: One‑click run on Apify Console or via REST API

🎯 Use Cases

  • Web Scraping Research – Extract HTML source for web scraping projects and analysis.
  • Web Archiving – Archive web pages for historical preservation and reference.
  • Content Migration – Capture HTML for migrating content to other platforms.
  • Competitor Analysis – Analyze competitor website structures and layouts.
  • SEO Analysis – Examine HTML structure for SEO optimization.
  • Web Page Backup – Create backups of important web pages.
  • Legal Compliance – Archive web pages for legal or compliance purposes.
  • Academic Research – Collect HTML data for research studies.
  • Website Testing – Test website HTML before and after changes.
  • Content Analysis – Analyze page content and structure.
  • Dynamic Content Capture – Extract JavaScript-rendered content.
  • Meta Data Extraction – Extract titles, descriptions, and metadata.
  • Accessibility Analysis – Analyze HTML for accessibility compliance.
  • Performance Research – Study website code and performance metrics.

πŸš€ Quick Start

  1. Open in Apify Console – visit the Actor page and click Try for free.
  2. Enter a URL – provide the URL of the web page you want to download HTML from.
  3. Click Start – the Actor will launch browser, load the page, and capture HTML.
  4. View Results – check the dataset for HTML metadata and storage links.
  5. Download HTML – access the full HTML file from the provided URL.
  6. View Screenshot – see the page preview screenshot.
  7. Export – download the results as JSON, CSV, or Excel.

You can also call this Actor programmatically via Apify SDK or REST API – ideal for automated web scraping and archiving pipelines.


πŸ’Ž Why This Tool?

FeatureBenefit
βœ… Full HTML captureGet complete page source including dynamic content.
βœ… Browser automationHandle JavaScript-rendered pages accurately.
βœ… Screenshots includedGet visual preview of captured pages.
βœ… Secure storageFiles stored in cloud-based Key-Value Store.
βœ… Metadata extractionGet page titles and structure information.
βœ… Error handlingRobust fallback mechanisms for reliability.
βœ… Fast processingGet results quickly with optimized browser settings.
βœ… Apify ecosystemSeamless integration with other Actors, triggers, and webhooks.

πŸ“¦ Changelog

  • Initial release of Download HTML from URLs
  • Full HTML source code capture from any URL
  • Browser automation with Playwright for dynamic content
  • Screenshot generation for visual preview
  • Secure Key-Value Store storage for HTML and images
  • Complete metadata extraction (titles, page info)
  • Error handling with detailed error messages
  • Automatic dataset integration
  • Full Apify Actor integration

πŸ§‘β€πŸ’» Support & Feedback

  • Issues & Ideas: Open a ticket on the Apify Actor issue tracker
  • Documentation: Visit Apify Docs for comprehensive platform guides
  • Community: Join the Apify community forum for discussions and support
  • Bug Reports: Submit detailed bug reports through the issue tracker
  • Feature Requests: Suggest new features to improve the tool

πŸ’° Pricing

  • Free for basic usage on Apify platform
  • Paid plans available for higher limits and priority support
  • Compute credits consumed for browser automation and processing
  • Storage credits consumed for HTML and screenshot storage

Disclaimer: Download HTML from URLs is provided as-is for research and archiving purposes. Users are responsible for ensuring their usage complies with website policies and applicable laws. Browser automation may consume significant compute resources.


πŸŽ‰ Get Started Today

Begin downloading HTML from web pages now!

Use Download HTML from URLs for:

  • πŸ” Web Scraping Research
  • πŸ“š Content Archiving
  • πŸ•·οΈ Web Analysis
  • πŸ“Š Competitive Research
  • πŸ” Legal Compliance

Perfect for:

  • Web Developers
  • Researchers
  • Data Analysts
  • SEO Specialists
  • Content Archivists

Status: Active Development
Support: 24/7 Customer Support Available
Platform: Apify


For comprehensive web analysis and content management, explore our full suite of tools:

  • All-in-One Media Downloader
  • Ultimate Video Info Fetcher
  • Google Search Results Scraper
  • Pinterest Pin Video Downloader
  • Web Content Analysis Tools