Pricing

Pay per event

🕰️ Wayback Machine Bulk Checker

Check bulk lists of URLs against the Internet Archive database to instantly verify cache availability. Automate historical web page discovery for large sites.

Pricing

Pay per event

Rating

0.0

(0)

Developer

naoki anzai

Actor stats

Bookmarked

Total users

Monthly active users

17 days ago

Last modified

📚 Wayback Machine Checker

Instantly check bulk lists of URLs against the Internet Archive to discover cached web pages and extract historical site data. Designed for technical SEO professionals and site administrators, this Wayback Machine checker automates the discovery of archived content at scale. Instead of loading the archive interface for every broken link, simply input your URL list and let the scraper query the underlying API to find exact snapshot matches.

Large-scale website migrations frequently result in orphaned pages and 404 errors. This tool is built to cross-reference your dead URLs with historical web cache data, enabling rapid recovery of lost link equity. Web developers and SEO teams use this scraper to map legacy site structures, audit domain history before acquisitions, and rescue deleted content that Google search previously indexed.

The scraper efficiently processes bulk requests, returning highly structured results for your technical audits. For every requested URL, you will extract the exact timestamp of the most recent snapshot, the direct Wayback Machine URL for the cached HTML, and the HTTP status code of the original capture. Schedule this checker to run alongside your regular broken link crawlers to instantly identify which dead pages can be fully restored from the web archive.

📄 Live sample output: see docs/sample-output.json for a representative dataset captured from a real run of this actor. Use it to validate the schema before subscribing.

Store Quickstart

Start with the Quickstart template to verify 3 archived URLs. For bulk verification, use Portfolio Archive Check with up to 500 URLs. For content recovery, use 404 Recovery after running Broken Link Checker.

Key Features

📚 Official Internet Archive API — Uses archive.org/wayback/available endpoint
📅 Closest-snapshot lookup — Find archived version nearest to any date
🔍 Availability check — Know if a URL was ever archived
📊 Snapshot count — Total archived versions per URL
⚡ Bulk processing — Up to 500 URLs per run
🔑 No API key needed — Free, open Internet Archive service

Use Cases

Who	Why
Compliance teams	Legal evidence preservation for regulated industries
Journalists	Verify historical versions of web pages that may have been edited
SEO recovery	Restore content from accidentally deleted pages
Brand protection	Track archived versions of competitor sites over time
Academic research	Cite archived web sources in publications

Input

Field	Type	Default	Description
urls	string[]	(required)	URLs to check in archive (max 500)
closest	string		Target date YYYY-MM-DD (optional)
checkAvailability	boolean	true	Return availability details

Input Example

{
  "urls": ["https://example.com/old-article", "https://deleted-site.com"],
  "closest": "2020-01-01",
  "checkAvailability": true
}

Input Examples

Example: Single URL availability

{
  "urls": [
    "https://example.com/old-page"
  ]
}

Example: Bulk archived snapshot history

{
  "urls": [
    "https://example.com/",
    "https://example.com/blog"
  ],
  "includeCdx": true,
  "maxSnapshotsPerUrl": 50
}

Example: Domain change-detection digest

{
  "urls": [
    "https://example.com/policy"
  ],
  "compareToLatestLive": true
}

Output

Field	Type	Description
`url`	string	URL queried
`archived`	boolean	Whether the URL has any snapshots in the Wayback Machine
`closestSnapshotUrl`	string	URL of the closest snapshot to the requested date
`closestSnapshotDate`	string	Date of the closest snapshot (YYYYMMDDhhmmss)
`totalSnapshots`	integer	Approximate total snapshots ever taken
`firstSnapshotDate`	string	Date of the earliest known snapshot
`lastSnapshotDate`	string	Date of the most recent snapshot

Output Example

{
  "url": "https://example.com/old-article",
  "available": true,
  "closestSnapshot": {
    "url": "https://web.archive.org/web/20200115000000/https://example.com/old-article",
    "timestamp": "20200115000000"
  },
  "archivedVersions": 23
}

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~wayback-machine-checker/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "urls": ["https://example.com/old-article", "https://deleted-site.com"], "closest": "2020-01-01", "checkAvailability": true }'

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/wayback-machine-checker").call(run_input={
  "urls": ["https://example.com/old-article", "https://deleted-site.com"],
  "closest": "2020-01-01",
  "checkAvailability": true
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/wayback-machine-checker').call({
  "urls": ["https://example.com/old-article", "https://deleted-site.com"],
  "closest": "2020-01-01",
  "checkAvailability": true
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

Use closestDate: "20200101" format (YYYYMMDD) to find a specific historical snapshot.
Great for verifying when a page was first published or last modified.
Combine with Broken Link Checker to recover content from dead pages via archive links.
Wayback Machine is free but rate-limits aggressive callers — keep concurrency low.

FAQ

How far back can I check?

Internet Archive has snapshots back to 1996. Coverage depends on whether a URL was crawled.

Why is my URL 'not available'?

Either it was never archived, or Internet Archive excluded it (due to robots.txt or removal request).

Is this the same as running curl to archive.org?

Yes, but with bulk processing, error handling, and structured output for datasets.

Can I archive new URLs?

This actor only reads from the archive. To save NEW pages, use archive.org's /save/ endpoint.

Why is archived false for my URL?

The Internet Archive may not have crawled that URL yet, or robots.txt blocked it at the time.

Can I trigger a new snapshot?

Not via this actor. Use the Wayback Machine 'Save Page Now' feature manually.

Complete Your Website Health Audit

Website Health Suite — Build a comprehensive compliance and trust monitoring workflow:

1. Link & URL Health

🔗 Broken Link Checker — Find broken links across your entire site structure
🔗 Bulk URL Health Checker — Validate HTTP status, redirects, SSL, and response times

2. SEO & Metadata Quality

🏷️ Meta Tag Analyzer — Audit title tags, Open Graph, Twitter Cards, and hreflang
Schema.org Validator — Validate JSON-LD and Microdata with quality scoring

3. Security & Email Deliverability

DNS/DMARC Security Checker — Audit SPF, DKIM, DMARC, and MX records

4. Historical Data & Recovery (you are here)

📚 Wayback Machine Checker — Find archived snapshots for content recovery

Recommended workflow: Run Broken Link Checker → Export 404 URLs → Use Wayback Machine Checker to find archived versions → Restore content → Validate with URL Health Checker.

Other Website Tools:

Sitemap Analyzer — SEO sitemap audit
Site Governance Monitor — Robots.txt and schema monitoring
Domain Trust Monitor — SSL expiry and security headers

Cost

Pay Per Event:

actor-start: $0.01 (flat fee per run)
dataset-item: $0.003 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01

No subscription required — you only pay for what you use.

💾 Save it for later: click the bookmark icon at the top of the Apify Store page if you'd like to come back to it. Bookmarks help other engineers find this actor via Apify's discovery surfaces.

⭐ Was Wayback Machine Bulk Checker useful for your URL archive history?

If this actor saved you time, please leave a 5★ rating on Apify Store — it takes 10 seconds, helps other engineers and analysts discover it, and keeps updates free.

Have a feature request, bug, or sample workflow you'd like to share? Open an issue — we read every one and use them to prioritise the next release.

Internet Archive & Wayback Machine Scraper

cloud9_ai/internet-archive-scraper

Search Internet Archive and check Wayback Machine snapshots. Access 800B+ archived pages, books, movies, audio. Search items, get metadata, or check URL archive history. No API key needed. For SEO, OSINT, legal, and research.

cloud9

Internet Archive & Wayback Machine Scraper

mangudai/internet-archive-scraper

Search the Internet Archive's 40M+ items, pull full item metadata and file lists, and query the Wayback Machine for URL snapshots. Books, audio, video, software, and archived pages on official archive.org APIs. No API key.

Mangudäi

Wayback Machine Archive Scraper

andok/wayback-machine-scraper

Fetch historical snapshots of any webpage from the Internet Archive. Perfect for digital forensics and tracking deleted content.

Andok

Wayback Machine Search

crawlerbros/wayback-machine-search

Query Internet Archive's Wayback Machine for historical snapshots of any URL or domain. Filter by date, HTTP status, MIME type, and deduplicate. Optionally fetch the archived page text. Free public CDX API, no authentication.

Crawler Bros

Wayback Machine Scraper - Track Website Changes Over Time

ryanclinton/wayback-machine-search

Search the Internet Archive's Wayback Machine for historical snapshots of any website. Retrieve archived page metadata -- including timestamps, URLs, MIME types, HTTP status codes, and content hashes -- for up to 10,000 snapshots per run.

Ryan Clinton

105

Internet Archive Search — Wayback Machine Advanced Query Tool

maged120/archive-org-advanced-search

Search the Internet Archive (archive.org) with full advanced filter support — date range, media type, language, subject, and more. Returns metadata from archived web pages, books, audio, and video.

Maged

Websites Archiver (Wayback Machine)

web.harvester/websites-archiver

Effortlessly archive any website with our Automated Website Archiving Tool. It leverages the power of the Wayback Machine at web.archive.org to ensure your sites are preserved for future reference.