Wayback Machine Snapshots Scraper — Internet Archive History
Pricing
from $1.00 / 1,000 archived snapshot returneds
Wayback Machine Snapshots Scraper — Internet Archive History
List every Internet Archive snapshot of a URL, page, or whole domain. Timestamp, snapshot URL, status code, mime type, content length. No login.
Pricing
from $1.00 / 1,000 archived snapshot returneds
Rating
0.0
(0)
Developer
Andrew
Maintained by CommunityActor stats
0
Bookmarked
1
Total users
0
Monthly active users
5 days ago
Last modified
Categories
Share
List every Internet Archive snapshot of a URL, page, or whole domain — with timestamp, snapshot URL, status code, mime type, and content length. No login.
What you get
- Every archived capture of a URL since the page first hit the Wayback Machine
- Direct snapshot URLs (
https://web.archive.org/web/{timestamp}/{url}) — paste straight into a browser - HTTP status code, MIME type, and byte size for each capture
- Content digest, so you can dedupe identical captures and only see when the page actually changed
- Date-range, status-code, and MIME-type filters
- Match modes: exact URL, prefix, hostname, or whole domain (covers subdomains)
- Cursor-based pagination — fetch unlimited captures across multiple runs
- Direct export to JSON, CSV, Excel, or Google Sheets
Use cases
- SEO and competitive intel — track when a competitor changed their pricing, copy, or layout
- OSINT — recover deleted or modified pages, track changes over time
- Broken-link recovery — find the most recent working snapshot of a 404'd page
- Content audit — list every URL ever archived for a domain (subdomains included)
- Compliance and legal — produce a timeline of what a site looked like on a given date
How to use
- Enter a URL (e.g.
example.com,https://example.com/page) - Choose a Match Type:
- Exact — only this URL
- Prefix — this URL and everything below
- Host — every URL on this hostname
- Domain — every URL across the whole domain and its subdomains
- Optionally filter by Date from / Date to (YYYY-MM-DD), HTTP status code (e.g.
200), or MIME type (e.g.text/html) - Toggle Collapse duplicate captures to dedupe by content digest (recommended)
- Set Max snapshots (default 1000; 0 for unlimited)
- Run the actor — one snapshot per row in the Dataset tab
- To fetch more snapshots, open the Key-value store tab → copy the
NEXT_PAGE_IDvalue → paste it into Page ID on your next run
Output format
One snapshot per dataset row — perfect for direct CSV, Excel, or Google Sheets export:
{"timestamp": "20231215120000","archivedAt": "2023-12-15T12:00:00.000Z","originalUrl": "http://example.com/","snapshotUrl": "https://web.archive.org/web/20231215120000/http://example.com/","statusCode": 200,"mimeType": "text/html","contentLength": 1234,"digest": "ABC123XYZ"}
Pagination
Big sites can have hundreds of thousands of snapshots. The actor saves a resume cursor (the Internet Archive's CDX resume key) to the default Key-value store under NEXT_PAGE_ID.
- Open the Key-value store tab on the run page
- Copy the value of
NEXT_PAGE_ID - Start a new run and paste it into Page ID
When NEXT_PAGE_ID is null, all snapshots have been fetched.
Input options
| Field | Type | Description |
|---|---|---|
| URL | string | URL or domain to look up (required) |
| Match Type | enum | Exact / Prefix / Host / Domain |
| Date from | string | YYYY-MM-DD UTC — optional |
| Date to | string | YYYY-MM-DD UTC — optional |
| HTTP status code | string | Filter to one HTTP status, e.g. 200 |
| MIME type | string | Filter by content type, e.g. text/html |
| Collapse duplicates | boolean | Dedupe by content digest — default on |
| Max snapshots | integer | Cap per run — default 1000, 0 for unlimited |
| Page ID | string | NEXT_PAGE_ID from the previous run, to resume pagination |