🚀 Professional Universal HTML & Media Extractor

Pricing

$7.00/month + usage

🚀 Professional Universal HTML & Media Extractor

This script uses Playwright with an Apify Actor to fetch the complete HTML source of any website. The user provides a URL, the page is loaded with JavaScript execution, the full HTML is printed in the terminal, saved to an HTML file,

Pricing

$7.00/month + usage

Rating

0.0

(0)

Developer

Data Pilot

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

2 days ago

Last modified

💎 Why Use This Scraper?

Standard scrapers often fail to capture content that is loaded dynamically via JavaScript. This actor solves that by:

Executing JavaScript: It waits for the page to fully "hydrate" before capturing data.
Bypassing Restrictions: Integrated with Residential Proxies to minimize CAPTCHAs and IP bans.
Visual Verification: Automatically takes a screenshot so you can see exactly what the bot saw.

🛠️ Detailed Features

1. Advanced Browser Stealth

Utilizes Chromium with AutomationControlled disabled and custom User-Agent strings to appear as a genuine human user.

2. Residential Proxy Integration

Configured to use Apify's premium residential proxy pool by default, ensuring high success rates on sites with strict anti-bot shields.

3. Multiple Output Formats

Dataset: Structured JSON/CSV/Excel data including Title, URL, and Timestamp.
Key-Value Store (HTML): The full raw HTML source saved as a viewable .html file.
Key-Value Store (Image): A high-resolution .png screenshot of the page.

Output Fields

Field	Type	Description
`status`	String	"Success" or "Failed"
`site_info.title`	String	Page title
`site_info.original_url`	String	Input URL
`site_info.final_url`	String	Final URL (after redirects)
`site_info.scraped_at`	String	Timestamp of scraping
`site_info.html_length`	Integer	HTML content length in characters
`full_html`	String	Complete HTML source code (if enabled)

🛠️ How It Works

Input Processing - Parses and validates URLs
Browser Launch - Starts Chromium browser via Playwright
Page Navigation - Visits each URL with timeout handling
Content Extraction - Captures page title and HTML
Data Storage - Pushes structured data to Apify dataset
Error Handling - Logs failures and continues with next URL

⚙️ Technical Details

Browser: Chromium (via Playwright)
Memory: Minimum 512MB recommended
Language: Python 3.11+
Dependencies:
- apify - Apify SDK
- playwright - Browser automation
- aiohttp - Async HTTP client

📝 Use Cases

Content Analysis - Extract and analyze website content
SEO Auditing - Check page titles and meta information
Website Monitoring - Track changes in website content
Data Migration - Backup website HTML
Research - Collect data from multiple websites
Competitive Analysis - Compare competitor websites

💡 Tips for Best Results

Batch Processing: Process multiple URLs in one run
Wait Time: Adjust based on website loading speed
Proxy Usage: Enable for blocked or geo-restricted sites
HTML Size: Be aware large pages will increase dataset size
Rate Limiting: Add delays between requests for large batches

⚠️ Legal & Ethical Use

Respect website Terms of Service
Check robots.txt before scraping
Don't overload servers with requests
Comply with data protection laws (GDPR, etc.)
Use responsibly and ethically

📥 Input Configuration

The Actor accepts the following JSON input:

Field	Type	Description
`url`	String	The specific website link you want to scrape.
`urls`	Array	(Optional) A list of multiple URLs to process in sequence.

Example Input:

{
    "url": "[https://www.youtube.com/watch?v=dQw4w9WgXcQ](https://www.youtube.com/watch?v=dQw4w9WgXcQ)"
}

HTML Scraper pro

scrapingxpert/html-scraper-pro

The HTML Scraper Pro is a powerful tool designed to extract the HTML source code and metadata from websites. It uses advanced web scraping techniques to retrieve the full HTML content of web pages,page title and HTTP status code.This tool is ideal for data extraction, website analysis, and archiving

scrapingxpert

249

5.0

Generic Html Scraper

daddyapi/generic-html-scraper

A lightweight, robust, and simple actor to fetch the raw HTML content of any URL

DaddyAPI

5.0

HTML Scraper

making-data-meaningful/html-scraper

Access and extract full HTML source code from any webpage instantly. The HTML Scraper API lets you retrieve clean, accurate page HTML for SEO analysis, web scraping, and content monitoring - all without being blocked.

Making Data Meaningful

Download HTML from URLs

mtrunkat/url-list-download-html

This actor takes a list of URLs and downloads HTML of each page.

Marek Trunkát

8.9K

Download HTML from URLs

scrapeai/html-downloader

This actor takes a list of URLs and downloads HTML of each page.

ScrapeAI

5.0

API / JSON scraper

pocesar/json-downloader

Scrape any API / JSON URLs directly to the dataset, and return them in CSV, XML, HTML, or Excel formats. Transform and filter the output. Enables you to follow pagination recursively from the payload without the need to visit the HTML page.

Paulo Cesar

547

HTML to JSON Smart Parser

parseforge/html-to-json-smart-parser

Convert HTML to structured JSON using AI! Uses OpenAI to extract and structure data from HTML into clean JSON format. Perfect for developers and data analysts who need to transform HTML into structured data without manual parsing.

ParseForge

5.0

Domain name or website url to Linkedin Company page url

sbzh/domain-name-or-website-url-to-linkedin-company-page-url

Use this tool to retrieve the LinkedIn URL of the company page on the website of the company you are searching for. Simply enter the domain name or URL of the company's website and retrieve the LinkedIn URL of the company page in the format https://www.linkedin.com/company/...

Sambzh

My Actor

david15999/my-actor

HTML scraper

David Emanuel Moreira

Page Source Code Scraper

making-data-meaningful/page-source-scraper

Access the full HTML source code of any webpage with a simple API call without fear of being blocked. The PageSource Scraper API is designed for fast and reliable web scraping, SEO analysis, and content monitoring.

Making Data Meaningful

133