Pricing

from $1.00 / 1,000 archived snapshot returneds

Wayback Machine Snapshots Scraper — Internet Archive History

List every Internet Archive snapshot of a URL, page, or whole domain. Timestamp, snapshot URL, status code, mime type, content length. No login.

Pricing

from $1.00 / 1,000 archived snapshot returneds

Rating

0.0

(0)

Developer

Andrew

Actor stats

Bookmarked

Total users

Monthly active users

25 days ago

Last modified

What you get

Every archived capture of a URL since the page first hit the Wayback Machine
Direct snapshot URLs (https://web.archive.org/web/{timestamp}/{url}) — paste straight into a browser
HTTP status code, MIME type, and byte size for each capture
Content digest, so you can dedupe identical captures and only see when the page actually changed
Date-range, status-code, and MIME-type filters
Match modes: exact URL, prefix, hostname, or whole domain (covers subdomains)
Cursor-based pagination — fetch unlimited captures across multiple runs
Direct export to JSON, CSV, Excel, or Google Sheets

Use cases

SEO and competitive intel — track when a competitor changed their pricing, copy, or layout
OSINT — recover deleted or modified pages, track changes over time
Broken-link recovery — find the most recent working snapshot of a 404'd page
Content audit — list every URL ever archived for a domain (subdomains included)
Compliance and legal — produce a timeline of what a site looked like on a given date

How to use

Enter a URL (e.g. example.com, https://example.com/page)
Choose a Match Type:
- Exact — only this URL
- Prefix — this URL and everything below
- Host — every URL on this hostname
- Domain — every URL across the whole domain and its subdomains
Optionally filter by Date from / Date to (YYYY-MM-DD), HTTP status code (e.g. 200), or MIME type (e.g. text/html)
Toggle Collapse duplicate captures to dedupe by content digest (recommended)
Set Max snapshots (default 1000; 0 for unlimited)
Run the actor — one snapshot per row in the Dataset tab
To fetch more snapshots, open the Key-value store tab → copy the NEXT_PAGE_ID value → paste it into Page ID on your next run

Output format

One snapshot per dataset row — perfect for direct CSV, Excel, or Google Sheets export:

{
  "timestamp": "20231215120000",
  "archivedAt": "2023-12-15T12:00:00.000Z",
  "originalUrl": "http://example.com/",
  "snapshotUrl": "https://web.archive.org/web/20231215120000/http://example.com/",
  "statusCode": 200,
  "mimeType": "text/html",
  "contentLength": 1234,
  "digest": "ABC123XYZ"
}

Pagination

Big sites can have hundreds of thousands of snapshots. The actor saves a resume cursor (the Internet Archive's CDX resume key) to the default Key-value store under NEXT_PAGE_ID.

Open the Key-value store tab on the run page
Copy the value of NEXT_PAGE_ID
Start a new run and paste it into Page ID

When NEXT_PAGE_ID is null, all snapshots have been fetched.

Input options

Field	Type	Description
URL	string	URL or domain to look up (required)
Match Type	enum	Exact / Prefix / Host / Domain
Date from	string	YYYY-MM-DD UTC — optional
Date to	string	YYYY-MM-DD UTC — optional
HTTP status code	string	Filter to one HTTP status, e.g. `200`
MIME type	string	Filter by content type, e.g. `text/html`
Collapse duplicates	boolean	Dedupe by content digest — default on
Max snapshots	integer	Cap per run — default 1000, 0 for unlimited
Page ID	string	`NEXT_PAGE_ID` from the previous run, to resume pagination

Wayback Machine Scraper

gio21/wayback-machine-scraper

List Internet Archive Wayback Machine snapshots for one or more URLs. Returns timestamp, snapshot URL, HTTP status, MIME type, digest. Useful for tracking website changes over time, OSINT research, content recovery, and brand monitoring.

Gio

Internet Archive & Wayback Machine Scraper

cloud9_ai/internet-archive-scraper

Search Internet Archive and check Wayback Machine snapshots. Access 800B+ archived pages, books, movies, audio. Search items, get metadata, or check URL archive history. No API key needed. For SEO, OSINT, legal, and research.

cloud9

Wayback Machine Search

crawlerbros/wayback-machine-search

Query Internet Archive's Wayback Machine for historical snapshots of any URL or domain. Filter by date, HTTP status, MIME type, and deduplicate. Optionally fetch the archived page text. Free public CDX API, no authentication.

Crawler Bros

Wayback Machine CDX URL List Scraper

parseforge/wayback-cdx-scraper

Pull every archived URL the Internet Archive has captured for any domain or URL prefix. Get timestamps, MIME types, status codes, content digests, and direct snapshot links. Filter by date range, status, MIME, and uniqueness. Export to JSON, CSV, or Excel for SEO recovery and competitive research.

ParseForge

Wayback Machine Archive Scraper

andok/wayback-machine-scraper

Fetch historical snapshots of any webpage from the Internet Archive. Perfect for digital forensics and tracking deleted content.

Andok

Wayback Machine Scraper

glassventures/wayback-machine-scraper

Scrape Wayback Machine archive snapshots for any URL or domain. Get archived URLs, timestamps, status codes, MIME types. Export to JSON, CSV, Excel.

Glass Ventures

Wayback Machine Scraper - Track Website Changes Over Time

ryanclinton/wayback-machine-search

Search the Internet Archive's Wayback Machine for historical snapshots of any website. Retrieve archived page metadata -- including timestamps, URLs, MIME types, HTTP status codes, and content hashes -- for up to 10,000 snapshots per run.

Ryan Clinton

Internet Archive Search — Wayback Machine Advanced Query Tool

maged120/archive-org-advanced-search

Search the Internet Archive (archive.org) with full advanced filter support — date range, media type, language, subject, and more. Returns metadata from archived web pages, books, audio, and video.

Maged

Wayback Cdx Scraper

fortuitous_pirate/wayback-cdx-scraper

Scrape the Internet Archive Wayback Machine CDX index: find all archived snapshots of any URL with timestamps, HTTP status codes, and MIME types.

Fortuitous Pirate

Wayback Machine CDX Bulk Extractor

automation-lab/wayback-machine-cdx-extractor

Bulk extract archived snapshot metadata from the Wayback Machine CDX API. Get every crawled URL, timestamp, HTTP status code, MIME type, and content digest for any domain or URL pattern. Export to JSON, CSV, or Excel.