Pricing

from $20.00 / 1,000 results

Website Recovery Actor

Recover and reverse-engineer website files from a live or deployed site. This actor downloads the complete website including HTML, CSS, JavaScript, images, fonts, and all other assets, then rewrites URLs so everything works locally.

Pricing

from $20.00 / 1,000 results

Rating

0.0

(0)

Developer

Cody Churchwell

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

Features

Full JS Rendering: Uses Puppeteer to capture JavaScript-rendered content (React, Vue, Next.js, etc.)
Complete Asset Download: Downloads all CSS, JS, images, fonts, videos, and other media
CDN Support: Optionally downloads assets from external CDNs (Google Fonts, Cloudflare, jsDelivr)
URL Rewriting: Converts absolute URLs to relative paths so the site works locally
Structure Preservation: Maintains original URL path structure for easy navigation
CSS Asset Extraction: Parses CSS files to find and download additional assets (fonts, background images)
Inline Style Extraction: Optionally extracts inline <style> tags to separate files
Concurrent Downloads: Processes multiple pages and assets in parallel for speed
Proxy Support: Access geo-restricted or blocked sites using Apify Proxy

How It Works

Crawling: The actor visits your start URL and follows links to discover all pages on the same domain
Rendering: Each page is fully rendered using a headless Chrome browser to capture dynamic content
Asset Extraction: All linked assets (CSS, JS, images, fonts) are identified and queued for download
Download: Assets are downloaded in parallel batches
URL Rewriting: All URLs in HTML and CSS files are rewritten to use relative local paths
Output: Files are saved to Apify's Key-Value Store with a manifest file listing everything

Input Configuration

Parameter	Type	Default	Description
`startUrl`	string	(required)	The URL of the website to recover
`maxDepth`	integer	5	How deep to follow links (0 = only start page)
`maxPages`	integer	100	Maximum number of pages to crawl
`downloadAssets`	boolean	true	Download CSS, JS, images, fonts, etc.
`rewriteUrls`	boolean	true	Convert URLs to relative paths
`includeExternalAssets`	boolean	true	Download assets from CDNs
`maxConcurrency`	integer	5	Parallel page processing
`downloadTimeout`	integer	30000	Asset download timeout (ms)
`userAgent`	string	Chrome UA	Browser user agent
`waitForSelector`	string	-	CSS selector to wait for (JS sites)
`extractInlineStyles`	boolean	false	Extract inline CSS to files
`preserveStructure`	boolean	true	Maintain URL path structure
`proxyConfiguration`	object	-	Proxy settings

Output

The actor outputs all files to the Key-Value Store:

HTML pages with rewritten URLs
CSS files with rewritten asset paths
JavaScript files
Images (PNG, JPG, GIF, WebP, SVG, ICO)
Fonts (WOFF, WOFF2, TTF, OTF)
Videos and audio files
__MANIFEST__ - JSON file listing all downloaded files with their original URLs

Downloading Your Files

After the run completes:

Go to the Storage tab
Click on Key-Value Store
Click Export to download all files as a ZIP
Extract the ZIP and you have your recovered website!

Example Usage

Basic Recovery

{
    "startUrl": "https://your-site.netlify.app"
}

Full Site with All Assets

{
    "startUrl": "https://example.com",
    "maxDepth": 10,
    "maxPages": 500,
    "downloadAssets": true,
    "includeExternalAssets": true,
    "rewriteUrls": true
}

JavaScript-Heavy Site (React/Vue/Next.js)

{
    "startUrl": "https://my-react-app.vercel.app",
    "waitForSelector": "#root",
    "maxDepth": 5,
    "maxPages": 100
}

Single Page Only

{
    "startUrl": "https://example.com/specific-page",
    "maxDepth": 0,
    "maxPages": 1
}

Tips for Best Results

For JS-rendered sites: Use waitForSelector to ensure content loads before capture
Large sites: Increase maxPages and maxConcurrency
Slow sites: Increase downloadTimeout
Blocked sites: Enable proxy configuration
CDN-heavy sites: Keep includeExternalAssets enabled

Limitations

Cannot recover server-side logic (APIs, databases, authentication)
Some minified/bundled JS may be difficult to understand
Dynamic content that requires user interaction won't be captured
Login-protected content requires additional setup

Technical Details

Built with Apify SDK 3.x and Crawlee
Uses Puppeteer for browser automation
Cheerio for HTML parsing
Respects robots.txt (can be disabled)

Cost Estimate

Typical usage costs approximately:

$1-3 per 100 pages with full asset download
Larger sites with many assets may cost more

License

ISC License

Support

For issues or feature requests, please open an issue on the repository.

Website MCP Server — CSS, Fonts, Colors for AI

geniuslead/website-mcp-server

Give your AI full access to any website. Download and query CSS, images, fonts, colors, frameworks, and page structure via MCP tools. Works with Claude, Cursor, ChatGPT, and any MCP-compatible AI.

Genius Lead

Full Site Downloader | $4.99/Site | 1-Time Crawl | All Assets

hailey_apify/Full-Website-Downloader

Full-Website-Downloader - Automatically crawls entire websites including HTML and all static assets (CSS, JS, images, etc.), preserves complete structure and exports as ZIP package. Supports depth control and same-domain resource filtering.

Hailey

Website extract

mrahil/my-actor

It is website extractor

Mohammed Rahil

143

HTML/Website Media Scraper

hlymrk/html-web-media-scraper

The Website Media scraper extracts all media files, i.e images, videos, audio, and other related media elements, from multiple websites. It then provides the corresponding descriptions or the alt="" content. You'll need to use proxies to run this actor for some websites with bot blocking features.

$crypt

236

4.6

site-lens-analyzer

koushikbiswas/site-lens-analyzer

Enter any website URL and get a complete visual + technical snapshot: full-page screenshots (desktop & mobile), extracted text, images, videos, links, fonts, colors, and CSS insights — stored cleanly for audits, redesigns, and competitive analysis.

koushik Biswas

5.0

Website-checker-starter

vaclavrut/website-checker-starter

Works with lukaskrivka/website-checker. The idea is that this actor manages more URLs on the input, will start website-checker with 10 runs at a time and store all data to one datasets.

Vaclav Rut

Extract Contact Details from Any Website – Email, Phone, Social

unlimitedleadtestinbox/extract-contact-details-from-any-website---email-phone-social

Find website contact details from website url (Email + Phone)

unli

Extract Emails from any website

scraplib/extract-emails-from-any-website

Extract email addresses from any website. Whether you're scraping a single company website or automating bulk email collection across thousands of URLs, this actor ensures high accuracy and scalability.

Scraplib

236

Download HTML from URLs

datapilot/download-html-from-urls

This script with an Apify Actor to fetch the complete HTML source of any website. The user provides a URL, the page is loaded with JavaScript execution, the full HTML is printed in the terminal, saved to an HTML file,

Data Pilot

Website Logo Extractor

botflowtech/website-logo-extractor

Extract logos and brand icon assets from any website URL. Fast website logo scraper for marketers, designers, and automation workflows. Export results in JSON or CSV via Apify dataset.