Wayback Machine Scraper — Extract Historical Website Snapshots avatar

Wayback Machine Scraper — Extract Historical Website Snapshots

Pricing

Pay per usage

Go to Apify Store
Wayback Machine Scraper — Extract Historical Website Snapshots

Wayback Machine Scraper — Extract Historical Website Snapshots

Retrieve archived versions of any webpage from the Wayback Machine. Track how sites changed over time, recover deleted content, monitor competitor history. Extract snapshots by date range. Perfect for SEO audits, competitive intelligence, and digital forensics. No API key needed.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Alex

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

0

Monthly active users

4 hours ago

Last modified

Categories

Share

Wayback Machine Scraper — Website History & Archived Snapshots

Retrieve historical snapshots of any website from the Internet Archive's Wayback Machine. Find all archived versions with timestamps, HTTP status codes, MIME types, and direct archive URLs. Filter by date range to focus on specific time periods.

Features

  • Snapshot Discovery — find all archived versions of any URL in the Wayback Machine
  • Date Range Filtering — narrow results to a specific time period (from/to dates)
  • Archive URLs — direct links to each archived snapshot for instant access
  • Status Codes — HTTP status code for each snapshot (200, 301, 404, etc.)
  • MIME Types & Sizes — content type and byte size of each archived page
  • Deduplication — collapse parameter removes near-duplicate snapshots (configurable)
  • Up to 1000 Snapshots — retrieve extensive history per URL with configurable limits

Output Example

{
"url": "google.com",
"timestamp": "20260101120000",
"dateISO": "2026-01-01T12:00:00Z",
"statusCode": 200,
"mimeType": "text/html",
"size": 15234,
"digest": "ABC123DEF456",
"archiveUrl": "https://web.archive.org/web/20260101120000/google.com",
"inputUrl": "google.com",
"scrapedAt": "2026-03-18T10:00:00.000Z"
}

Use Cases

  • Competitive Analysis — track how competitor websites evolved over time (design, messaging, pricing)
  • Brand Monitoring — verify historical claims and content changes on any website
  • SEO Research — analyze how page structure and content changed relative to ranking shifts
  • Legal & Compliance — document website content at specific dates for evidence purposes
  • Content Recovery — find and recover deleted pages, blog posts, or product listings
  • Market Research — study how industry landing pages and value propositions evolved

Input Parameters

ParameterTypeDefaultDescription
urlsarray[]URLs to look up (e.g., "google.com", "apple.com")
maxSnapshotsPerUrlinteger20Max snapshots per URL (1-1000)
fromDatestring""Start date filter (YYYY-MM-DD)
toDatestring""End date filter (YYYY-MM-DD)

How It Works

The scraper queries the Wayback Machine's CDX Server API, which indexes all archived snapshots in the Internet Archive. It retrieves snapshot metadata including timestamps, status codes, and content digests, then constructs direct archive URLs for each result. The collapse parameter deduplicates snapshots taken within the same time window.