Pricing

from $0.90 / 1,000 saved archive items

Wayback Machine Search

Search Wayback Machine snapshots for URLs, hosts, and domains. Export archive dates, status codes, MIME types, digests, content text, version timelines, reports, and monitoring alerts.

Pricing

from $0.90 / 1,000 saved archive items

Rating

0.0

(0)

Developer

Maxime Dupré

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

🕰️ Wayback Machine search for archive history

Wayback Machine Search finds historical snapshots in the Internet Archive Wayback Machine for the URLs, hosts, or domains you submit. Use this Wayback Machine search Actor to export archive dates, original URLs, HTTP status codes, MIME types, content digests, content length, archived page text when available, version timelines, Markdown reports, and monitoring alerts.

It is built for SEO audits, OSINT research, legal evidence checks, website change tracking, link-rot recovery, content history reviews, and scheduled archive monitoring. You can start with one URL such as https://example.com/, a bare domain such as example.com, or up to 50 targets in one run. No Wayback Machine API key, cookies, login, or user proxy setup is required.

🔎 What this Actor does

Searches the Wayback Machine CDX index for exact URLs, URL prefixes, hosts, or full domains.
Filters archive rows by date range, HTTP status code, and MIME type.
Saves raw snapshot rows with source-backed archive metadata.
Collapses repeated captures by content digest or by month, day, or hour.
Finds snapshots closest to a target date and adds distance in days.
Optionally fetches readable archived page text for a capped number of snapshots.
Emits deterministic change evidence from status, digest, length, and fetched text changes.
Builds version timeline rows when you want a compact history.
Generates a Markdown report in report mode.
Supports monitoring mode with alert rows for new archive rows, status changes, content changes, or removed/restored signals.

This Actor searches archive data. It does not crawl the live web, create new Wayback captures, perform visual screenshot diffs, use AI summaries, or promise complete archive coverage. Availability depends on what the Internet Archive has stored.

📦 Data you get

Snapshot rows can include:

target - submitted URL or domain that produced the row
originalUrl - original archived URL from the Wayback capture
waybackTimestamp and archiveDate - source timestamp and ISO date
statusCode, mimeType, contentDigest, and contentLength
contentStatus and optional content when archived text is fetched
distanceFromTargetDays for closest-date evidence
change evidence with the previous timestamp and source-backed reason

Version rows group consecutive captures into timeline intervals. Summary rows report per-target coverage, counts, date range, discovered paths, subdomains, and emails found in fetched content. Alert rows appear in monitoring mode only when the selected mechanical alert rule is met.

You can export the dataset as JSON, CSV, Excel, XML, RSS, or HTML, or consume the rows through the Apify API, schedules, webhooks, and integrations.

🚀 How to run it

Add one or more URLs or domains in URLs or domains.
Choose Archive scope: exact URL, URL prefix, same host, or same domain and subdomains.
Set optional date, status, and MIME filters.
Pick an output mode: raw snapshots, changed snapshots, timeline, closest snapshot to date, Markdown report, or monitoring delta.
Keep Collapse snapshots by on content digest for compact results, or choose every snapshot for full raw history.
Leave Fetch archived page text on when you need readable content or phrase search evidence, or turn it off for faster metadata-only runs.
Run the Actor and open the dataset or optional Markdown report.

For a small first run, use:

{
	"targets": ["example.com"],
	"matchType": "domain",
	"maxResults": 10,
	"statusFilter": "200",
	"mimeFilter": "text/html",
	"outputMode": "snapshots",
	"collapseBy": "digest",
	"includeContent": false
}

⚙️ Input options

targets is required and accepts up to 50 URLs or domains.

matchType controls how broadly each target is searched. Use exact URL for one page, prefix for a path, host for one hostname, and domain when subdomains should be included.

maxResults limits saved snapshot, version, or alert rows per target. The maximum is 10,000.

dateFrom and dateTo accept YYYY, YYYYMM, or YYYYMMDD. statusFilter accepts a status such as 200 or 404. mimeFilter accepts a type such as text/html.

outputMode changes the result shape. Use raw snapshots for exports, timeline for version intervals, closest snapshot to date for evidence work, report for a Markdown summary, and monitoring for scheduled archive checks.

includeContent, maxContentFetch, and historyQuery control archived text fetching. Content fetching is capped so large archive searches do not fetch every historical page by accident. Some archived pages can be slow or unavailable, so each snapshot row reports its own contentStatus.

🧾 Output example

{
	"recordType": "snapshot",
	"target": "example.com",
	"originalUrl": "https://example.com/pricing",
	"waybackTimestamp": "20240510123045",
	"archiveDate": "2024-05-10T12:30:45.000Z",
	"statusCode": 200,
	"mimeType": "text/html",
	"contentDigest": "M5W6TLBPLQWJXTQWJ2R5XQ7Y3YQK4K6L",
	"contentLength": 18432,
	"contentStatus": "notRequested",
	"content": null,
	"distanceFromTargetDays": null,
	"change": {
		"changed": true,
		"type": "digestChange",
		"previousArchiveDate": "2024-04-01T08:15:30.000Z",
		"previousWaybackTimestamp": "20240401081530",
		"evidence": ["Digest changed from ABC123 to M5W6TLBPLQWJXTQWJ2R5XQ7Y3YQK4K6L"]
	},
	"version": null,
	"diff": null,
	"summary": null,
	"alert": null
}

💳 Pricing

This Actor uses pay-per-event pricing. You are charged for each saved successful archive item: snapshot, version, summary, or monitoring alert. Empty archive searches, invalid inputs, skipped content fetches, and source issues do not create dataset rows.

Pricing starts at $0.0018 per saved archive item on the Free tier and goes down to $0.0009 per saved archive item on higher tiers. Always check the Actor Pricing tab before starting a large run.

⚠️ Limits and caveats

The Internet Archive may not have snapshots for every page or date.
A successful run can return zero rows when no matching archive data exists.
Archive text can be unavailable, non-HTML, capped, or skipped by the content fetch limit.
Monitoring compares the latest saved archive state for the same target and filters. It is based on Wayback captures, not live website polling.
Change labels are mechanical and source-backed. They do not claim semantic meaning such as a product, legal, or pricing change unless the returned text evidence shows it.

❓ FAQ

❓ Does this use the official Wayback Machine?

It reads public Internet Archive Wayback Machine data through the CDX index and archived playback pages.

🔑 Do I need a Wayback Machine API key?

No. The Actor does not ask for a Wayback Machine API key, cookies, or login.

📡 Can it monitor a live website?

It monitors changes in Wayback Machine archive captures. It does not poll the current live page independently.

🔗 Why is there no `archiveUrl` field?

Rows keep originalUrl and waybackTimestamp. A playback URL can be reconstructed as https://web.archive.org/web/{waybackTimestamp}/{originalUrl} when you need to open the archived page.

📝 Changelog

0.1: Initial release.

🆘 Support

For issues, questions, or feature requests, file a ticket and I'll fix or implement it in less than 24h 🫡

🔗 Other actors

Sitemap Sniffer ↗ - Find public sitemap files and URL inventories before an archive or SEO audit.
Website URL Crawler ↗ - Crawl rendered website links and export a clean link map.
SEMrush Free Website Stats Scraper ↗ - Collect public domain traffic, authority, backlink, and referral metrics.
Ahrefs Free Website Stats Scraper ↗ - Export public Ahrefs domain rating, traffic, rank, and linking website stats.
Font Detector ↗ - Detect fonts, CSS families, and source evidence on public web pages.

Made with ❤️ by Maxime Dupré

Wayback Machine Scraper

glassventures/wayback-machine-scraper

Scrape Wayback Machine archive snapshots for any URL or domain. Get archived URLs, timestamps, status codes, MIME types. Export to JSON, CSV, Excel.

Glass Ventures

Wayback Machine Scraper - Track Website Changes Over Time

ryanclinton/wayback-machine-search

Search the Internet Archive's Wayback Machine for historical snapshots of any website. Retrieve archived page metadata -- including timestamps, URLs, MIME types, HTTP status codes, and content hashes -- for up to 10,000 snapshots per run.

Ryan Clinton

113

Wayback Machine Scraper

gio21/wayback-machine-scraper

List Internet Archive Wayback Machine snapshots for one or more URLs. Returns timestamp, snapshot URL, HTTP status, MIME type, digest. Useful for tracking website changes over time, OSINT research, content recovery, and brand monitoring.

Gio

Wayback Machine Search

crawlerbros/wayback-machine-search

Query Internet Archive's Wayback Machine for historical snapshots of any URL or domain. Filter by date, HTTP status, MIME type, and deduplicate. Optionally fetch the archived page text. Free public CDX API, no authentication.

Crawler Bros

Wayback Machine Historical Content Scraper

happyfhantum/wayback-machine-historical-content-scraper

Compare archived website snapshots through the Wayback Machine and extract page-history change signals.

Kelsey Todd

4.0

Wayback Machine Snapshot History Scraper

automation-lab/wayback-machine-snapshot-history-scraper

Export Internet Archive Wayback Machine snapshot history with replay URLs, timestamps, status, MIME, digest, and size filters.

Stas Persiianenko

Wayback Cdx Scraper

fortuitous_pirate/wayback-cdx-scraper

Scrape the Internet Archive Wayback Machine CDX index: find all archived snapshots of any URL with timestamps, HTTP status codes, and MIME types.

Fortuitous Pirate

Wayback Machine Scraper — Archived Snapshots

hipersoft/wayback-machine-scraper

List every Internet Archive (Wayback Machine) snapshot of a URL or whole domain: timestamp, snapshot URL, status code, MIME type and content digest. Filter by date, status and dedupe. For SEO, OSINT and historical research. No key.

hiper soft

Wayback Machine Extractor - Archive.org Downloader

caulleonard/wayback-extractor

Recover dead or deleted websites. Download HTML, files, and snapshot metadata from the Wayback Machine.

Fatih Şahinbaş

Wayback Machine & Internet Archive Scraper

immense_insulator/internet-archive-data-extractor

Export Wayback Machine snapshot history and Archive.org metadata to JSON or CSV. CDX filters, checksums, file inventories, no login or proxy required.