🕰️ Wayback Machine Bulk Checker
Pricing
Pay per event
🕰️ Wayback Machine Bulk Checker
Check bulk lists of URLs against the Internet Archive database to instantly verify cache availability. Automate historical web page discovery for large sites.
Pricing
Pay per event
Rating
0.0
(0)
Developer
太郎 山田
Maintained by CommunityActor stats
0
Bookmarked
6
Total users
2
Monthly active users
8 days ago
Last modified
Categories
Share
📚 Wayback Machine Checker
Instantly check bulk lists of URLs against the Internet Archive to discover cached web pages and extract historical site data. Designed for technical SEO professionals and site administrators, this Wayback Machine checker automates the discovery of archived content at scale. Instead of loading the archive interface for every broken link, simply input your URL list and let the scraper query the underlying API to find exact snapshot matches.
Large-scale website migrations frequently result in orphaned pages and 404 errors. This tool is built to cross-reference your dead URLs with historical web cache data, enabling rapid recovery of lost link equity. Web developers and SEO teams use this scraper to map legacy site structures, audit domain history before acquisitions, and rescue deleted content that Google search previously indexed.
The scraper efficiently processes bulk requests, returning highly structured results for your technical audits. For every requested URL, you will extract the exact timestamp of the most recent snapshot, the direct Wayback Machine URL for the cached HTML, and the HTTP status code of the original capture. Schedule this checker to run alongside your regular broken link crawlers to instantly identify which dead pages can be fully restored from the web archive.
Store Quickstart
Start with the Quickstart template to verify 3 archived URLs. For bulk verification, use Portfolio Archive Check with up to 500 URLs. For content recovery, use 404 Recovery after running Broken Link Checker.
Key Features
- 📚 Official Internet Archive API — Uses archive.org/wayback/available endpoint
- 📅 Closest-snapshot lookup — Find archived version nearest to any date
- 🔍 Availability check — Know if a URL was ever archived
- 📊 Snapshot count — Total archived versions per URL
- ⚡ Bulk processing — Up to 500 URLs per run
- 🔑 No API key needed — Free, open Internet Archive service
Use Cases
| Who | Why |
|---|---|
| Compliance teams | Legal evidence preservation for regulated industries |
| Journalists | Verify historical versions of web pages that may have been edited |
| SEO recovery | Restore content from accidentally deleted pages |
| Brand protection | Track archived versions of competitor sites over time |
| Academic research | Cite archived web sources in publications |
Input
| Field | Type | Default | Description |
|---|---|---|---|
| urls | string[] | (required) | URLs to check in archive (max 500) |
| closest | string | Target date YYYY-MM-DD (optional) | |
| checkAvailability | boolean | true | Return availability details |
Input Example
{"urls": ["https://example.com/old-article", "https://deleted-site.com"],"closest": "2020-01-01","checkAvailability": true}
Input Examples
Example: Single URL availability
{"urls": ["https://example.com/old-page"]}
Example: Bulk archived snapshot history
{"urls": ["https://example.com/","https://example.com/blog"],"includeCdx": true,"maxSnapshotsPerUrl": 50}
Example: Domain change-detection digest
{"urls": ["https://example.com/policy"],"compareToLatestLive": true}
Output
| Field | Type | Description |
|---|---|---|
url | string | URL queried |
archived | boolean | Whether the URL has any snapshots in the Wayback Machine |
closestSnapshotUrl | string | URL of the closest snapshot to the requested date |
closestSnapshotDate | string | Date of the closest snapshot (YYYYMMDDhhmmss) |
totalSnapshots | integer | Approximate total snapshots ever taken |
firstSnapshotDate | string | Date of the earliest known snapshot |
lastSnapshotDate | string | Date of the most recent snapshot |
Output Example
{"url": "https://example.com/old-article","available": true,"closestSnapshot": {"url": "https://web.archive.org/web/20200115000000/https://example.com/old-article","timestamp": "20200115000000"},"archivedVersions": 23}
API Usage
Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.
cURL
curl -X POST "https://api.apify.com/v2/acts/taroyamada~wayback-machine-checker/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{ "urls": ["https://example.com/old-article", "https://deleted-site.com"], "closest": "2020-01-01", "checkAvailability": true }'
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("taroyamada/wayback-machine-checker").call(run_input={"urls": ["https://example.com/old-article", "https://deleted-site.com"],"closest": "2020-01-01","checkAvailability": true})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
JavaScript / Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('taroyamada/wayback-machine-checker').call({"urls": ["https://example.com/old-article", "https://deleted-site.com"],"closest": "2020-01-01","checkAvailability": true});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Tips & Limitations
- Use
closestDate: "20200101"format (YYYYMMDD) to find a specific historical snapshot. - Great for verifying when a page was first published or last modified.
- Combine with Broken Link Checker to recover content from dead pages via archive links.
- Wayback Machine is free but rate-limits aggressive callers — keep concurrency low.
FAQ
How far back can I check?
Internet Archive has snapshots back to 1996. Coverage depends on whether a URL was crawled.
Why is my URL 'not available'?
Either it was never archived, or Internet Archive excluded it (due to robots.txt or removal request).
Is this the same as running curl to archive.org?
Yes, but with bulk processing, error handling, and structured output for datasets.
Can I archive new URLs?
This actor only reads from the archive. To save NEW pages, use archive.org's /save/ endpoint.
Why is archived false for my URL?
The Internet Archive may not have crawled that URL yet, or robots.txt blocked it at the time.
Can I trigger a new snapshot?
Not via this actor. Use the Wayback Machine 'Save Page Now' feature manually.
Complete Your Website Health Audit
Website Health Suite — Build a comprehensive compliance and trust monitoring workflow:
1. Link & URL Health
- 🔗 Broken Link Checker — Find broken links across your entire site structure
- 🔗 Bulk URL Health Checker — Validate HTTP status, redirects, SSL, and response times
2. SEO & Metadata Quality
- 🏷️ Meta Tag Analyzer — Audit title tags, Open Graph, Twitter Cards, and hreflang
- Schema.org Validator — Validate JSON-LD and Microdata with quality scoring
3. Security & Email Deliverability
- DNS/DMARC Security Checker — Audit SPF, DKIM, DMARC, and MX records
4. Historical Data & Recovery (you are here)
- 📚 Wayback Machine Checker — Find archived snapshots for content recovery
Recommended workflow: Run Broken Link Checker → Export 404 URLs → Use Wayback Machine Checker to find archived versions → Restore content → Validate with URL Health Checker.
Other Website Tools:
- Sitemap Analyzer — SEO sitemap audit
- Site Governance Monitor — Robots.txt and schema monitoring
- Domain Trust Monitor — SSL expiry and security headers
Cost
Pay Per Event:
actor-start: $0.01 (flat fee per run)dataset-item: $0.003 per output item
Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01
No subscription required — you only pay for what you use.
⭐ Was this helpful?
If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.
Bug report or feature request? Open an issue on the Issues tab of this actor.

