Website Recovery Actor
Pricing
from $20.00 / 1,000 results
Website Recovery Actor
Recover and reverse-engineer website files from a live or deployed site. This actor downloads the complete website including HTML, CSS, JavaScript, images, fonts, and all other assets, then rewrites URLs so everything works locally.
Pricing
from $20.00 / 1,000 results
Rating
0.0
(0)
Developer

Cody Churchwell
Actor stats
0
Bookmarked
3
Total users
1
Monthly active users
21 days ago
Last modified
Categories
Share
Recover and reverse-engineer website files from a live or deployed site. This actor downloads the complete website including HTML, CSS, JavaScript, images, fonts, and all other assets, then rewrites URLs so everything works locally.
Perfect for:
- Recovering lost source files from Netlify, Vercel, GitHub Pages, or any deployed site
- Creating offline copies of websites
- Migrating sites to new hosting
- Archiving web content
- Analyzing site structure and assets
Features
- Full JS Rendering: Uses Puppeteer to capture JavaScript-rendered content (React, Vue, Next.js, etc.)
- Complete Asset Download: Downloads all CSS, JS, images, fonts, videos, and other media
- CDN Support: Optionally downloads assets from external CDNs (Google Fonts, Cloudflare, jsDelivr)
- URL Rewriting: Converts absolute URLs to relative paths so the site works locally
- Structure Preservation: Maintains original URL path structure for easy navigation
- CSS Asset Extraction: Parses CSS files to find and download additional assets (fonts, background images)
- Inline Style Extraction: Optionally extracts inline
<style>tags to separate files - Concurrent Downloads: Processes multiple pages and assets in parallel for speed
- Proxy Support: Access geo-restricted or blocked sites using Apify Proxy
How It Works
- Crawling: The actor visits your start URL and follows links to discover all pages on the same domain
- Rendering: Each page is fully rendered using a headless Chrome browser to capture dynamic content
- Asset Extraction: All linked assets (CSS, JS, images, fonts) are identified and queued for download
- Download: Assets are downloaded in parallel batches
- URL Rewriting: All URLs in HTML and CSS files are rewritten to use relative local paths
- Output: Files are saved to Apify's Key-Value Store with a manifest file listing everything
Input Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
startUrl | string | (required) | The URL of the website to recover |
maxDepth | integer | 5 | How deep to follow links (0 = only start page) |
maxPages | integer | 100 | Maximum number of pages to crawl |
downloadAssets | boolean | true | Download CSS, JS, images, fonts, etc. |
rewriteUrls | boolean | true | Convert URLs to relative paths |
includeExternalAssets | boolean | true | Download assets from CDNs |
maxConcurrency | integer | 5 | Parallel page processing |
downloadTimeout | integer | 30000 | Asset download timeout (ms) |
userAgent | string | Chrome UA | Browser user agent |
waitForSelector | string | - | CSS selector to wait for (JS sites) |
extractInlineStyles | boolean | false | Extract inline CSS to files |
preserveStructure | boolean | true | Maintain URL path structure |
proxyConfiguration | object | - | Proxy settings |
Output
The actor outputs all files to the Key-Value Store:
- HTML pages with rewritten URLs
- CSS files with rewritten asset paths
- JavaScript files
- Images (PNG, JPG, GIF, WebP, SVG, ICO)
- Fonts (WOFF, WOFF2, TTF, OTF)
- Videos and audio files
__MANIFEST__- JSON file listing all downloaded files with their original URLs
Downloading Your Files
After the run completes:
- Go to the Storage tab
- Click on Key-Value Store
- Click Export to download all files as a ZIP
- Extract the ZIP and you have your recovered website!
Example Usage
Basic Recovery
{"startUrl": "https://your-site.netlify.app"}
Full Site with All Assets
{"startUrl": "https://example.com","maxDepth": 10,"maxPages": 500,"downloadAssets": true,"includeExternalAssets": true,"rewriteUrls": true}
JavaScript-Heavy Site (React/Vue/Next.js)
{"startUrl": "https://my-react-app.vercel.app","waitForSelector": "#root","maxDepth": 5,"maxPages": 100}
Single Page Only
{"startUrl": "https://example.com/specific-page","maxDepth": 0,"maxPages": 1}
Tips for Best Results
- For JS-rendered sites: Use
waitForSelectorto ensure content loads before capture - Large sites: Increase
maxPagesandmaxConcurrency - Slow sites: Increase
downloadTimeout - Blocked sites: Enable proxy configuration
- CDN-heavy sites: Keep
includeExternalAssetsenabled
Limitations
- Cannot recover server-side logic (APIs, databases, authentication)
- Some minified/bundled JS may be difficult to understand
- Dynamic content that requires user interaction won't be captured
- Login-protected content requires additional setup
Technical Details
- Built with Apify SDK 3.x and Crawlee
- Uses Puppeteer for browser automation
- Cheerio for HTML parsing
- Respects
robots.txt(can be disabled)
Cost Estimate
Typical usage costs approximately:
- $1-3 per 100 pages with full asset download
- Larger sites with many assets may cost more
License
ISC License
Support
For issues or feature requests, please open an issue on the repository.