Pricing

Pay per event

📚 Wayback Machine Scraper

Scrape the Wayback Machine to find archived web pages. Search historical website versions, recover lost SEO content, and export scraped URLs.

Pricing

Pay per event

Rating

0.0

(0)

Developer

太郎山田

Actor stats

Bookmarked

Total users

Monthly active users

3 hours ago

Last modified

📚 Wayback Machine Checker

Check if URLs are archived on the Wayback Machine and find closest snapshots by date. Essential for compliance, legal evidence, and content restoration.

Store Quickstart

Start with the Quickstart template to verify 3 archived URLs. For bulk verification, use Portfolio Archive Check with up to 500 URLs.

Key Features

📚 Official Internet Archive API — Uses archive.org/wayback/available endpoint
📅 Closest-snapshot lookup — Find archived version nearest to any date
🔍 Availability check — Know if a URL was ever archived
📊 Snapshot count — Total archived versions per URL
⚡ Bulk processing — Up to 500 URLs per run
🔑 No API key needed — Free, open Internet Archive service

Use Cases

Who	Why
Compliance teams	Legal evidence preservation for regulated industries
Journalists	Verify historical versions of web pages that may have been edited
SEO recovery	Restore content from accidentally deleted pages
Brand protection	Track archived versions of competitor sites over time
Academic research	Cite archived web sources in publications

Input

Field	Type	Default	Description
urls	string[]	(required)	URLs to check in archive (max 500)
closest	string		Target date YYYY-MM-DD (optional)
checkAvailability	boolean	true	Return availability details

Input Example

{
  "urls": ["https://example.com/old-article", "https://deleted-site.com"],
  "closest": "2020-01-01",
  "checkAvailability": true
}

Output

Field	Type	Description
`url`	string	URL queried
`archived`	boolean	Whether the URL has any snapshots in the Wayback Machine
`closestSnapshotUrl`	string	URL of the closest snapshot to the requested date
`closestSnapshotDate`	string	Date of the closest snapshot (YYYYMMDDhhmmss)
`totalSnapshots`	integer	Approximate total snapshots ever taken
`firstSnapshotDate`	string	Date of the earliest known snapshot
`lastSnapshotDate`	string	Date of the most recent snapshot

Output Example

{
  "url": "https://example.com/old-article",
  "available": true,
  "closestSnapshot": {
    "url": "https://web.archive.org/web/20200115000000/https://example.com/old-article",
    "timestamp": "20200115000000"
  },
  "archivedVersions": 23
}

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~wayback-machine-checker/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "urls": ["https://example.com/old-article", "https://deleted-site.com"], "closest": "2020-01-01", "checkAvailability": true }'

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/wayback-machine-checker").call(run_input={
  "urls": ["https://example.com/old-article", "https://deleted-site.com"],
  "closest": "2020-01-01",
  "checkAvailability": true
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/wayback-machine-checker').call({
  "urls": ["https://example.com/old-article", "https://deleted-site.com"],
  "closest": "2020-01-01",
  "checkAvailability": true
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

Use closestDate: "20200101" format (YYYYMMDD) to find a specific historical snapshot.
Great for verifying when a page was first published or last modified.
Combine with Broken Link Checker to recover content from dead pages via archive links.
Wayback Machine is free but rate-limits aggressive callers — keep concurrency low.

FAQ

How far back can I check?

Internet Archive has snapshots back to 1996. Coverage depends on whether a URL was crawled.

Why is my URL 'not available'?

Either it was never archived, or Internet Archive excluded it (due to robots.txt or removal request).

Is this the same as running curl to archive.org?

Yes, but with bulk processing, error handling, and structured output for datasets.

Can I archive new URLs?

This actor only reads from the archive. To save NEW pages, use archive.org's /save/ endpoint.

Why is archived false for my URL?

The Internet Archive may not have crawled that URL yet, or robots.txt blocked it at the time.

Can I trigger a new snapshot?

Not via this actor. Use the Wayback Machine 'Save Page Now' feature manually.

URL/Link Tools cluster — explore related Apify tools:

🔗 URL Health Checker — Bulk-check HTTP status codes, redirects, SSL validity, and response times for thousands of URLs.
🔗 Broken Link Checker — Crawl websites to find broken links, 404 errors, and dead URLs.
🔗 URL Unshortener — Expand bit.
🏷️ Meta Tag Analyzer — Analyze meta tags, Open Graph, Twitter Cards, JSON-LD, and hreflang for any URL.
Sitemap Analyzer API | sitemap.xml SEO Audit — Analyze sitemap.
Schema.org Validator API | JSON-LD + Microdata — Validate JSON-LD and Microdata across multiple pages, score markup quality, and flag missing or malformed Schema.
Site Governance Monitor | Robots, Sitemap & Schema — Recurring robots.
RDAP Domain Monitor API | Ownership + Expiry — Monitor domain registration data via RDAP and track expiry, registrar, nameserver, and ownership changes in structured rows.
Domain Security Audit API | SSL Expiry, DMARC, Domain Expiry — Summary-first portfolio monitor for SSL expiry, DMARC/SPF/DKIM, domain expiry/ownership, and security headers with remediation-ready outputs.

Cost

Pay Per Event:

actor-start: $0.01 (flat fee per run)
dataset-item: $0.003 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01

No subscription required — you only pay for what you use.

Internet Archive & Wayback Machine Scraper

cloud9_ai/internet-archive-scraper

Search Internet Archive and check Wayback Machine snapshots. Access 800B+ archived pages, books, movies, audio. Search items, get metadata, or check URL archive history. No API key needed. For SEO, OSINT, legal, and research.

cloud9

Wayback Machine Checker

automation-lab/wayback-machine-checker

This actor checks if URLs are archived in the Internet Archive Wayback Machine. It retrieves snapshot counts, oldest and newest archive dates, and direct links to archived versions. Uses both the Availability API and CDX API for comprehensive results.

Stas Persiianenko

Wayback Machine Historical Content Scraper

happyfhantum/wayback-machine-historical-content-scraper

Compare archived website snapshots through the Wayback Machine and extract page-history change signals.

Kelsey Todd

4.0

Wayback Machine Search

ryanclinton/wayback-machine-search

Search the Internet Archive's Wayback Machine for historical snapshots of any website. Retrieve archived page metadata -- including timestamps, URLs, MIME types, HTTP status codes, and content hashes -- for up to 10,000 snapshots per run.

ryan clinton

Wayback Machine Scraper — Extract Historical Website Snapshots

knotless_cadence/wayback-machine-scraper

Scrape Wayback Machine archives. Find historical versions of any webpage. Track website changes over time. Get snapshots by date range.

Alex

Wayback Machine Archive Scraper

andok/wayback-machine-scraper

Fetch historical snapshots of any webpage from the Internet Archive. Perfect for digital forensics and tracking deleted content.

Andok

Websites Archiver (Wayback Machine)

web.harvester/websites-archiver

Effortlessly archive any website with our Automated Website Archiving Tool. It leverages the power of the Wayback Machine at web.archive.org to ensure your sites are preserved for future reference.

Web Harvester

5.0

Expired Article Hunter

eneiromatos/expired-article-hunter

Expired Article Hunter is a Wayback Machine Scraper designed to extract expired content to be reused in Moneysites and PBNs. With EAH Wayback Machine Scraper you'll never have to write or buy articles again!

Eneiro Matos

139

internet birthday - what the web was like on your birthday

hanamira/internet-birthday

What did the internet look like the day you were born? Get #1 songs, movies, download times, prices, Wayback Machine snapshots, and see which websites didn't exist yet. Fun, shareable nostalgia for any date.