Wayback Machine Scraper — Extract Historical Website Snapshots
Pricing
Pay per usage
Wayback Machine Scraper — Extract Historical Website Snapshots
Retrieve archived versions of any webpage from the Wayback Machine. Track how sites changed over time, recover deleted content, monitor competitor history. Extract snapshots by date range. Perfect for SEO audits, competitive intelligence, and digital forensics. No API key needed.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Alex
Actor stats
0
Bookmarked
3
Total users
0
Monthly active users
4 hours ago
Last modified
Categories
Share
Wayback Machine Scraper — Website History & Archived Snapshots
Retrieve historical snapshots of any website from the Internet Archive's Wayback Machine. Find all archived versions with timestamps, HTTP status codes, MIME types, and direct archive URLs. Filter by date range to focus on specific time periods.
Features
- Snapshot Discovery — find all archived versions of any URL in the Wayback Machine
- Date Range Filtering — narrow results to a specific time period (from/to dates)
- Archive URLs — direct links to each archived snapshot for instant access
- Status Codes — HTTP status code for each snapshot (200, 301, 404, etc.)
- MIME Types & Sizes — content type and byte size of each archived page
- Deduplication — collapse parameter removes near-duplicate snapshots (configurable)
- Up to 1000 Snapshots — retrieve extensive history per URL with configurable limits
Output Example
{"url": "google.com","timestamp": "20260101120000","dateISO": "2026-01-01T12:00:00Z","statusCode": 200,"mimeType": "text/html","size": 15234,"digest": "ABC123DEF456","archiveUrl": "https://web.archive.org/web/20260101120000/google.com","inputUrl": "google.com","scrapedAt": "2026-03-18T10:00:00.000Z"}
Use Cases
- Competitive Analysis — track how competitor websites evolved over time (design, messaging, pricing)
- Brand Monitoring — verify historical claims and content changes on any website
- SEO Research — analyze how page structure and content changed relative to ranking shifts
- Legal & Compliance — document website content at specific dates for evidence purposes
- Content Recovery — find and recover deleted pages, blog posts, or product listings
- Market Research — study how industry landing pages and value propositions evolved
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
urls | array | [] | URLs to look up (e.g., "google.com", "apple.com") |
maxSnapshotsPerUrl | integer | 20 | Max snapshots per URL (1-1000) |
fromDate | string | "" | Start date filter (YYYY-MM-DD) |
toDate | string | "" | End date filter (YYYY-MM-DD) |
How It Works
The scraper queries the Wayback Machine's CDX Server API, which indexes all archived snapshots in the Internet Archive. It retrieves snapshot metadata including timestamps, status codes, and content digests, then constructs direct archive URLs for each result. The collapse parameter deduplicates snapshots taken within the same time window.