Download HTML from URLs
Pricing
$7.00/month + usage
Download HTML from URLs
This script with an Apify Actor to fetch the complete HTML source of any website. The user provides a URL, the page is loaded with JavaScript execution, the full HTML is printed in the terminal, saved to an HTML file,
Pricing
$7.00/month + usage
Rating
0.0
(0)
Developer

Data Pilot
Actor stats
0
Bookmarked
4
Total users
2
Monthly active users
3 hours ago
Last modified
Categories
Share
π Download HTML from URLs is a powerful Apify Actor designed to download HTML content from any URL using advanced browser automation. This tool captures the full HTML source code of web pages, including dynamically loaded content, and provides screenshots for visual reference. Whether you're conducting web scraping research, archiving web pages, or analyzing HTML structures, the Download HTML from URLs delivers comprehensive HTML data efficiently.
With browser-based automation, the Download HTML from URLs ensures accurate capture of rendered HTML that may not be available through simple HTTP requests. It focuses on key HTML metrics like full page source and titles, making it an essential tool for HTML analysis and web content extraction.
π₯ Features
- Full HTML Capture β Downloads complete HTML source code from any URL, including JavaScript-rendered content.
- Browser Automation β Uses Playwright for headless browser navigation to access fully loaded HTML pages.
- Screenshot Generation β Captures page previews as PNG images for visual HTML analysis.
- Dynamic Content Handling β Waits for page loads and JavaScript execution to ensure complete HTML extraction.
- Secure Storage β Saves HTML files and screenshots in Apify's Key-Value Store for easy access.
- Single URL Processing β Processes one URL per run for precise HTML results.
- Error Handling β Robust logging for issues during HTML downloading.
- Dataset Integration β Automatically uploads HTML metadata to your Apify dataset for easy export and analysis.
βοΈ How It Works
The Download HTML from URLs takes a target URL as input and uses Playwright to launch a headless browser, navigate to the URL, and wait for the page to fully load. It captures the full HTML source and takes a screenshot, then saves the HTML and image files to Apify's Key-Value Store while pushing metadata to the dataset. This approach ensures reliable, complete HTML extraction from any web page.
Key Processing Steps:
- URL Validation β Parse and validate the provided URL
- Browser Launch β Initialize headless Playwright browser
- Page Navigation β Navigate to the target URL
- Content Loading β Wait for full page load and JavaScript execution
- HTML Capture β Extract complete HTML source code
- Screenshot β Capture page preview as PNG image
- Secure Storage β Save files to Key-Value Store
- Data Export β Push metadata to dataset
Key benefits for HTML analysis:
- Access full HTML source for web scraping and research.
- Capture dynamic HTML content that requires JavaScript.
- Build HTML archives for content preservation.
- Analyze page structure and metadata.
- Extract rendered HTML content accurately.
π₯ Input
The downloader accepts the following input parameters:
| Field | Type | Default | Description |
|---|---|---|---|
url | string | required | The URL to download HTML from (e.g., "https://www.example.com"). |
Example input JSON:
{"url": "https://www.example.com"}
π€ Output
The downloader outputs HTML metadata in JSON format. Each record includes:
| Field | Type | Description |
|---|---|---|
url | string | The original URL provided. |
title | string | Title of the web page. |
html_content | string | Full HTML source code of the page. |
html_file_url | string | URL to access FULL_PAGE_SOURCE.html from Key-Value Store. |
screenshot_url | string | URL to access PAGE_PREVIEW.png screenshot from Key-Value Store. |
page_size | integer | Size of HTML file in bytes. |
captured_at | string | ISO timestamp of the HTML capture. |
status | string | Status of capture ("Success" or "Failed"). |
Example output for successful HTML download:
{"url": "https://www.example.com","title": "Example Page Title","html_content": "<!DOCTYPE html><html><head><title>Example Page Title</title>...</head><body>...</body></html>","html_file_url": "https://api.apify.com/v2/key-value-stores/store_id/records/FULL_PAGE_SOURCE.html","screenshot_url": "https://api.apify.com/v2/key-value-stores/store_id/records/PAGE_PREVIEW.png","page_size": 45823,"captured_at": "2025-02-14T12:00:00Z","status": "Success"}
Example error response:
{"url": "https://www.example.com/invalid","status": "Failed","error": "Page not accessible or timeout occurred","captured_at": "2025-02-14T12:00:00Z"}
π§° Technical Stack
- Browser Automation: Playwright β Fast, headless browser automation
- Page Interaction: DOM manipulation and rendering
- Screenshot Capture: PNG image generation
- Storage: Apify Key-Value Store for secure file storage
- Platform: Apify Actor β serverless, scalable, integrated with Dataset and KeyβValue Store
- Deployment: Oneβclick run on Apify Console or via REST API
π― Use Cases
- Web Scraping Research β Extract HTML source for web scraping projects and analysis.
- Web Archiving β Archive web pages for historical preservation and reference.
- Content Migration β Capture HTML for migrating content to other platforms.
- Competitor Analysis β Analyze competitor website structures and layouts.
- SEO Analysis β Examine HTML structure for SEO optimization.
- Web Page Backup β Create backups of important web pages.
- Legal Compliance β Archive web pages for legal or compliance purposes.
- Academic Research β Collect HTML data for research studies.
- Website Testing β Test website HTML before and after changes.
- Content Analysis β Analyze page content and structure.
- Dynamic Content Capture β Extract JavaScript-rendered content.
- Meta Data Extraction β Extract titles, descriptions, and metadata.
- Accessibility Analysis β Analyze HTML for accessibility compliance.
- Performance Research β Study website code and performance metrics.
π Quick Start
- Open in Apify Console β visit the Actor page and click Try for free.
- Enter a URL β provide the URL of the web page you want to download HTML from.
- Click Start β the Actor will launch browser, load the page, and capture HTML.
- View Results β check the dataset for HTML metadata and storage links.
- Download HTML β access the full HTML file from the provided URL.
- View Screenshot β see the page preview screenshot.
- Export β download the results as JSON, CSV, or Excel.
You can also call this Actor programmatically via Apify SDK or REST API β ideal for automated web scraping and archiving pipelines.
π Why This Tool?
| Feature | Benefit |
|---|---|
| β Full HTML capture | Get complete page source including dynamic content. |
| β Browser automation | Handle JavaScript-rendered pages accurately. |
| β Screenshots included | Get visual preview of captured pages. |
| β Secure storage | Files stored in cloud-based Key-Value Store. |
| β Metadata extraction | Get page titles and structure information. |
| β Error handling | Robust fallback mechanisms for reliability. |
| β Fast processing | Get results quickly with optimized browser settings. |
| β Apify ecosystem | Seamless integration with other Actors, triggers, and webhooks. |
π¦ Changelog
- Initial release of Download HTML from URLs
- Full HTML source code capture from any URL
- Browser automation with Playwright for dynamic content
- Screenshot generation for visual preview
- Secure Key-Value Store storage for HTML and images
- Complete metadata extraction (titles, page info)
- Error handling with detailed error messages
- Automatic dataset integration
- Full Apify Actor integration
π§βπ» Support & Feedback
- Issues & Ideas: Open a ticket on the Apify Actor issue tracker
- Documentation: Visit Apify Docs for comprehensive platform guides
- Community: Join the Apify community forum for discussions and support
- Bug Reports: Submit detailed bug reports through the issue tracker
- Feature Requests: Suggest new features to improve the tool
π° Pricing
- Free for basic usage on Apify platform
- Paid plans available for higher limits and priority support
- Compute credits consumed for browser automation and processing
- Storage credits consumed for HTML and screenshot storage
Disclaimer: Download HTML from URLs is provided as-is for research and archiving purposes. Users are responsible for ensuring their usage complies with website policies and applicable laws. Browser automation may consume significant compute resources.
π Get Started Today
Begin downloading HTML from web pages now!
Use Download HTML from URLs for:
- π Web Scraping Research
- π Content Archiving
- π·οΈ Web Analysis
- π Competitive Research
- π Legal Compliance
Perfect for:
- Web Developers
- Researchers
- Data Analysts
- SEO Specialists
- Content Archivists
Status: Active Development
Support: 24/7 Customer Support Available
Platform: Apify
π Related Tools
For comprehensive web analysis and content management, explore our full suite of tools:
- All-in-One Media Downloader
- Ultimate Video Info Fetcher
- Google Search Results Scraper
- Pinterest Pin Video Downloader
- Web Content Analysis Tools