Wayback Machine Snapshots Scraper — Internet Archive History avatar

Wayback Machine Snapshots Scraper — Internet Archive History

Pricing

from $1.00 / 1,000 archived snapshot returneds

Go to Apify Store
Wayback Machine Snapshots Scraper — Internet Archive History

Wayback Machine Snapshots Scraper — Internet Archive History

List every Internet Archive snapshot of a URL, page, or whole domain. Timestamp, snapshot URL, status code, mime type, content length. No login.

Pricing

from $1.00 / 1,000 archived snapshot returneds

Rating

0.0

(0)

Developer

Andrew

Andrew

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

5 days ago

Last modified

Share

List every Internet Archive snapshot of a URL, page, or whole domain — with timestamp, snapshot URL, status code, mime type, and content length. No login.

What you get

  • Every archived capture of a URL since the page first hit the Wayback Machine
  • Direct snapshot URLs (https://web.archive.org/web/{timestamp}/{url}) — paste straight into a browser
  • HTTP status code, MIME type, and byte size for each capture
  • Content digest, so you can dedupe identical captures and only see when the page actually changed
  • Date-range, status-code, and MIME-type filters
  • Match modes: exact URL, prefix, hostname, or whole domain (covers subdomains)
  • Cursor-based pagination — fetch unlimited captures across multiple runs
  • Direct export to JSON, CSV, Excel, or Google Sheets

Use cases

  • SEO and competitive intel — track when a competitor changed their pricing, copy, or layout
  • OSINT — recover deleted or modified pages, track changes over time
  • Broken-link recovery — find the most recent working snapshot of a 404'd page
  • Content audit — list every URL ever archived for a domain (subdomains included)
  • Compliance and legal — produce a timeline of what a site looked like on a given date

How to use

  1. Enter a URL (e.g. example.com, https://example.com/page)
  2. Choose a Match Type:
    • Exact — only this URL
    • Prefix — this URL and everything below
    • Host — every URL on this hostname
    • Domain — every URL across the whole domain and its subdomains
  3. Optionally filter by Date from / Date to (YYYY-MM-DD), HTTP status code (e.g. 200), or MIME type (e.g. text/html)
  4. Toggle Collapse duplicate captures to dedupe by content digest (recommended)
  5. Set Max snapshots (default 1000; 0 for unlimited)
  6. Run the actor — one snapshot per row in the Dataset tab
  7. To fetch more snapshots, open the Key-value store tab → copy the NEXT_PAGE_ID value → paste it into Page ID on your next run

Output format

One snapshot per dataset row — perfect for direct CSV, Excel, or Google Sheets export:

{
"timestamp": "20231215120000",
"archivedAt": "2023-12-15T12:00:00.000Z",
"originalUrl": "http://example.com/",
"snapshotUrl": "https://web.archive.org/web/20231215120000/http://example.com/",
"statusCode": 200,
"mimeType": "text/html",
"contentLength": 1234,
"digest": "ABC123XYZ"
}

Pagination

Big sites can have hundreds of thousands of snapshots. The actor saves a resume cursor (the Internet Archive's CDX resume key) to the default Key-value store under NEXT_PAGE_ID.

  1. Open the Key-value store tab on the run page
  2. Copy the value of NEXT_PAGE_ID
  3. Start a new run and paste it into Page ID

When NEXT_PAGE_ID is null, all snapshots have been fetched.

Input options

FieldTypeDescription
URLstringURL or domain to look up (required)
Match TypeenumExact / Prefix / Host / Domain
Date fromstringYYYY-MM-DD UTC — optional
Date tostringYYYY-MM-DD UTC — optional
HTTP status codestringFilter to one HTTP status, e.g. 200
MIME typestringFilter by content type, e.g. text/html
Collapse duplicatesbooleanDedupe by content digest — default on
Max snapshotsintegerCap per run — default 1000, 0 for unlimited
Page IDstringNEXT_PAGE_ID from the previous run, to resume pagination