Wayback Machine Bulk Lookup avatar

Wayback Machine Bulk Lookup

Pricing

Pay per event

Go to Apify Store
Wayback Machine Bulk Lookup

Wayback Machine Bulk Lookup

Look up Wayback Machine snapshots for any URL or list of URLs. Returns capture timeline, optional snapshot markdown, and live-vs-snapshot diff. Date range filtering, capture limit, bulk input. Built for OSINT, journalism, SEO link-rot recovery, and legal evidence.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

Look up Wayback Machine (archive.org) snapshots for any URL or list of URLs. Returns the full capture timeline, optional snapshot HTML-to-markdown content, and a live-vs-snapshot text diff. Built for OSINT analysts, journalists verifying sources, SEO teams recovering link-rot, and legal evidence collection.


What this actor does

For each input URL, the actor:

  1. Queries the Wayback CDX API to retrieve the snapshot index in your specified date range and capture limit
  2. Optionally fetches snapshot HTML for each capture and converts it to markdown (for reading or archiving)
  3. Optionally fetches the current live URL and computes a line-level text diff against the most recent snapshot (to detect page changes)

Each output record contains the full snapshot timeline plus optional diff and content fields.


Input

FieldTypeDefaultDescription
urlsarray of stringsRequired. URLs to look up in the Wayback Machine
maxItemsintegerMaximum total output records across all URLs
dateFromstringEarliest snapshot date to include (ISO date, e.g. 2020-01-01)
dateTostringLatest snapshot date to include (ISO date, e.g. 2024-12-31)
captureLimitinteger100Max snapshots per URL
fetchSnapshotContentbooleanfalseDownload snapshot HTML and convert to markdown
diffWithLivebooleanfalseCompute text diff between latest snapshot and current live URL
proxyConfigurationobjectnoneOptional proxy config (usually not needed for Wayback)

Example input:

{
"urls": [
"https://example.com/news/2024-article",
"https://example.com/about"
],
"dateFrom": "2023-01-01",
"dateTo": "2024-12-31",
"captureLimit": 50,
"diffWithLive": true
}

Output

One record per input URL.

FieldTypeDescription
urlstringThe input URL
snapshotCountnumberNumber of snapshots found in the date range
firstCapturedstringEarliest snapshot timestamp (ISO 8601)
lastCapturedstringLatest snapshot timestamp (ISO 8601)
capturesarraySnapshot entries — each a JSON-encoded string with timestamp, archiveUrl, status, mimetype, and optionally contentMarkdown
diffobject{ addedLines, removedLines, changedRatio } — only present when diffWithLive=true
liveStatusnumberCurrent HTTP status of the live URL — only present when diffWithLive=true
finalLiveUrlstringFinal URL after redirects
statusstringsuccess, timeout, or error
errorMsgstringError details on failure, null on success

Example output record:

{
"url": "https://example.com/news/2024-article",
"snapshotCount": 14,
"firstCaptured": "2024-03-12T08:42:00Z",
"lastCaptured": "2026-04-29T22:11:00Z",
"captures": [
"{\"timestamp\":\"2026-04-29T22:11:00Z\",\"archiveUrl\":\"https://web.archive.org/web/20260429221100/https://example.com/news/2024-article\",\"status\":200,\"mimetype\":\"text/html\"}"
],
"diff": { "addedLines": 12, "removedLines": 3, "changedRatio": 0.04 },
"liveStatus": 200,
"finalLiveUrl": "https://example.com/news/2024-article",
"status": "success",
"errorMsg": null
}

Dataset views

The actor produces two dataset views in the Apify console:

  • Capture Timelineurl, snapshotCount, firstCaptured, lastCaptured, captures
  • Live vs Snapshot Diffurl, liveStatus, diff, lastCaptured

Rate limits and performance

The actor respects Wayback Machine's rate limits:

  • CDX API queries: ~10 requests/second (110ms minimum delay)
  • Snapshot content fetches: ~1-2 requests/second (700ms minimum delay)

For large batches with fetchSnapshotContent=true, expect longer runtimes. The default timeout is 2 hours. Start with a small captureLimit (e.g. 10) to estimate runtime before running at full scale.


Use cases

  • OSINT / research: Check whether a source URL existed, when it was captured, and how its content has changed
  • Journalism: Verify archived versions of articles or government pages for fact-checking
  • SEO / link-rot recovery: Find archived versions of dead inbound links and plan redirects or outreach
  • Legal evidence: Retrieve timestamped snapshots of web pages for documentation
  • Web archiving: Bulk-check coverage for a list of URLs before deeper archiving work