📚 Wayback Machine Scraper avatar

📚 Wayback Machine Scraper

Pricing

Pay per event

Go to Apify Store
📚 Wayback Machine Scraper

📚 Wayback Machine Scraper

Scrape the Wayback Machine to find archived web pages. Search historical website versions, recover lost SEO content, and export scraped URLs.

Pricing

Pay per event

Rating

0.0

(0)

Developer

太郎 山田

太郎 山田

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 hours ago

Last modified

Share

📚 Wayback Machine Checker

Check if URLs are archived on the Wayback Machine and find closest snapshots by date. Essential for compliance, legal evidence, and content restoration.

Store Quickstart

Start with the Quickstart template to verify 3 archived URLs. For bulk verification, use Portfolio Archive Check with up to 500 URLs.

Key Features

  • 📚 Official Internet Archive API — Uses archive.org/wayback/available endpoint
  • 📅 Closest-snapshot lookup — Find archived version nearest to any date
  • 🔍 Availability check — Know if a URL was ever archived
  • 📊 Snapshot count — Total archived versions per URL
  • Bulk processing — Up to 500 URLs per run
  • 🔑 No API key needed — Free, open Internet Archive service

Use Cases

WhoWhy
Compliance teamsLegal evidence preservation for regulated industries
JournalistsVerify historical versions of web pages that may have been edited
SEO recoveryRestore content from accidentally deleted pages
Brand protectionTrack archived versions of competitor sites over time
Academic researchCite archived web sources in publications

Input

FieldTypeDefaultDescription
urlsstring[](required)URLs to check in archive (max 500)
closeststringTarget date YYYY-MM-DD (optional)
checkAvailabilitybooleantrueReturn availability details

Input Example

{
"urls": ["https://example.com/old-article", "https://deleted-site.com"],
"closest": "2020-01-01",
"checkAvailability": true
}

Output

FieldTypeDescription
urlstringURL queried
archivedbooleanWhether the URL has any snapshots in the Wayback Machine
closestSnapshotUrlstringURL of the closest snapshot to the requested date
closestSnapshotDatestringDate of the closest snapshot (YYYYMMDDhhmmss)
totalSnapshotsintegerApproximate total snapshots ever taken
firstSnapshotDatestringDate of the earliest known snapshot
lastSnapshotDatestringDate of the most recent snapshot

Output Example

{
"url": "https://example.com/old-article",
"available": true,
"closestSnapshot": {
"url": "https://web.archive.org/web/20200115000000/https://example.com/old-article",
"timestamp": "20200115000000"
},
"archivedVersions": 23
}

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~wayback-machine-checker/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "urls": ["https://example.com/old-article", "https://deleted-site.com"], "closest": "2020-01-01", "checkAvailability": true }'

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/wayback-machine-checker").call(run_input={
"urls": ["https://example.com/old-article", "https://deleted-site.com"],
"closest": "2020-01-01",
"checkAvailability": true
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/wayback-machine-checker').call({
"urls": ["https://example.com/old-article", "https://deleted-site.com"],
"closest": "2020-01-01",
"checkAvailability": true
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

  • Use closestDate: "20200101" format (YYYYMMDD) to find a specific historical snapshot.
  • Great for verifying when a page was first published or last modified.
  • Combine with Broken Link Checker to recover content from dead pages via archive links.
  • Wayback Machine is free but rate-limits aggressive callers — keep concurrency low.

FAQ

How far back can I check?

Internet Archive has snapshots back to 1996. Coverage depends on whether a URL was crawled.

Why is my URL 'not available'?

Either it was never archived, or Internet Archive excluded it (due to robots.txt or removal request).

Is this the same as running curl to archive.org?

Yes, but with bulk processing, error handling, and structured output for datasets.

Can I archive new URLs?

This actor only reads from the archive. To save NEW pages, use archive.org's /save/ endpoint.

Why is archived false for my URL?

The Internet Archive may not have crawled that URL yet, or robots.txt blocked it at the time.

Can I trigger a new snapshot?

Not via this actor. Use the Wayback Machine 'Save Page Now' feature manually.

URL/Link Tools cluster — explore related Apify tools:

Cost

Pay Per Event:

  • actor-start: $0.01 (flat fee per run)
  • dataset-item: $0.003 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01

No subscription required — you only pay for what you use.