Website Recovery Actor avatar
Website Recovery Actor

Pricing

from $20.00 / 1,000 results

Go to Apify Store
Website Recovery Actor

Website Recovery Actor

Recover and reverse-engineer website files from a live or deployed site. This actor downloads the complete website including HTML, CSS, JavaScript, images, fonts, and all other assets, then rewrites URLs so everything works locally.

Pricing

from $20.00 / 1,000 results

Rating

0.0

(0)

Developer

Cody Churchwell

Cody Churchwell

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

21 days ago

Last modified

Share

Recover and reverse-engineer website files from a live or deployed site. This actor downloads the complete website including HTML, CSS, JavaScript, images, fonts, and all other assets, then rewrites URLs so everything works locally.

Perfect for:

  • Recovering lost source files from Netlify, Vercel, GitHub Pages, or any deployed site
  • Creating offline copies of websites
  • Migrating sites to new hosting
  • Archiving web content
  • Analyzing site structure and assets

Features

  • Full JS Rendering: Uses Puppeteer to capture JavaScript-rendered content (React, Vue, Next.js, etc.)
  • Complete Asset Download: Downloads all CSS, JS, images, fonts, videos, and other media
  • CDN Support: Optionally downloads assets from external CDNs (Google Fonts, Cloudflare, jsDelivr)
  • URL Rewriting: Converts absolute URLs to relative paths so the site works locally
  • Structure Preservation: Maintains original URL path structure for easy navigation
  • CSS Asset Extraction: Parses CSS files to find and download additional assets (fonts, background images)
  • Inline Style Extraction: Optionally extracts inline <style> tags to separate files
  • Concurrent Downloads: Processes multiple pages and assets in parallel for speed
  • Proxy Support: Access geo-restricted or blocked sites using Apify Proxy

How It Works

  1. Crawling: The actor visits your start URL and follows links to discover all pages on the same domain
  2. Rendering: Each page is fully rendered using a headless Chrome browser to capture dynamic content
  3. Asset Extraction: All linked assets (CSS, JS, images, fonts) are identified and queued for download
  4. Download: Assets are downloaded in parallel batches
  5. URL Rewriting: All URLs in HTML and CSS files are rewritten to use relative local paths
  6. Output: Files are saved to Apify's Key-Value Store with a manifest file listing everything

Input Configuration

ParameterTypeDefaultDescription
startUrlstring(required)The URL of the website to recover
maxDepthinteger5How deep to follow links (0 = only start page)
maxPagesinteger100Maximum number of pages to crawl
downloadAssetsbooleantrueDownload CSS, JS, images, fonts, etc.
rewriteUrlsbooleantrueConvert URLs to relative paths
includeExternalAssetsbooleantrueDownload assets from CDNs
maxConcurrencyinteger5Parallel page processing
downloadTimeoutinteger30000Asset download timeout (ms)
userAgentstringChrome UABrowser user agent
waitForSelectorstring-CSS selector to wait for (JS sites)
extractInlineStylesbooleanfalseExtract inline CSS to files
preserveStructurebooleantrueMaintain URL path structure
proxyConfigurationobject-Proxy settings

Output

The actor outputs all files to the Key-Value Store:

  • HTML pages with rewritten URLs
  • CSS files with rewritten asset paths
  • JavaScript files
  • Images (PNG, JPG, GIF, WebP, SVG, ICO)
  • Fonts (WOFF, WOFF2, TTF, OTF)
  • Videos and audio files
  • __MANIFEST__ - JSON file listing all downloaded files with their original URLs

Downloading Your Files

After the run completes:

  1. Go to the Storage tab
  2. Click on Key-Value Store
  3. Click Export to download all files as a ZIP
  4. Extract the ZIP and you have your recovered website!

Example Usage

Basic Recovery

{
"startUrl": "https://your-site.netlify.app"
}

Full Site with All Assets

{
"startUrl": "https://example.com",
"maxDepth": 10,
"maxPages": 500,
"downloadAssets": true,
"includeExternalAssets": true,
"rewriteUrls": true
}

JavaScript-Heavy Site (React/Vue/Next.js)

{
"startUrl": "https://my-react-app.vercel.app",
"waitForSelector": "#root",
"maxDepth": 5,
"maxPages": 100
}

Single Page Only

{
"startUrl": "https://example.com/specific-page",
"maxDepth": 0,
"maxPages": 1
}

Tips for Best Results

  1. For JS-rendered sites: Use waitForSelector to ensure content loads before capture
  2. Large sites: Increase maxPages and maxConcurrency
  3. Slow sites: Increase downloadTimeout
  4. Blocked sites: Enable proxy configuration
  5. CDN-heavy sites: Keep includeExternalAssets enabled

Limitations

  • Cannot recover server-side logic (APIs, databases, authentication)
  • Some minified/bundled JS may be difficult to understand
  • Dynamic content that requires user interaction won't be captured
  • Login-protected content requires additional setup

Technical Details

  • Built with Apify SDK 3.x and Crawlee
  • Uses Puppeteer for browser automation
  • Cheerio for HTML parsing
  • Respects robots.txt (can be disabled)

Cost Estimate

Typical usage costs approximately:

  • $1-3 per 100 pages with full asset download
  • Larger sites with many assets may cost more

License

ISC License

Support

For issues or feature requests, please open an issue on the repository.