Wayback Machine Search
Pricing
from $0.90 / 1,000 saved archive results
Wayback Machine Search
Search Wayback Machine snapshots for URLs, hosts, and domains. Export archive dates, status codes, MIME types, digests, content text, version timelines, reports, and monitoring alerts.
Pricing
from $0.90 / 1,000 saved archive results
Rating
0.0
(0)
Developer
Maxime Dupré
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
🕰️ Wayback Machine search for archive history
Wayback Machine Search finds historical snapshots in the Internet Archive Wayback Machine for the URLs, hosts, or domains you submit. Use it to export archive dates, original URLs, HTTP status codes, MIME types, content digests, content length, optional archived page text, version timelines, Markdown reports, and monitoring alerts.
It is built for SEO audits, OSINT research, legal evidence checks, website change tracking, link-rot recovery, content history reviews, and scheduled archive monitoring. You can start with one URL such as https://example.com/, a bare domain such as example.com, or up to 50 targets in one run. No Wayback Machine API key, cookies, login, or user proxy setup is required.
🔎 What this Actor does
- Searches the Wayback Machine CDX index for exact URLs, URL prefixes, hosts, or full domains.
- Filters archive rows by date range, HTTP status code, and MIME type.
- Saves raw snapshot rows with source-backed archive metadata.
- Collapses repeated captures by content digest or by month, day, or hour.
- Finds snapshots closest to a target date and adds distance in days.
- Optionally fetches readable archived page text for a capped number of snapshots.
- Emits deterministic change evidence from status, digest, length, and fetched text changes.
- Builds version timeline rows when you want a compact history.
- Generates a Markdown report in report mode.
- Supports monitoring mode with alert rows for new archive rows, status changes, content changes, or removed/restored signals.
This Actor searches archive data. It does not crawl the live web, create new Wayback captures, perform visual screenshot diffs, use AI summaries, or promise complete archive coverage. Availability depends on what the Internet Archive has stored.
📦 Data you get
Snapshot rows can include:
target- submitted URL or domain that produced the roworiginalUrl- original archived URL from the Wayback capturewaybackTimestampandarchiveDate- source timestamp and ISO datestatusCode,mimeType,contentDigest, andcontentLengthcontentStatusand optionalcontentwhen archived text is fetcheddistanceFromTargetDaysfor closest-date evidencechangeevidence with the previous timestamp and source-backed reason
Version rows group consecutive captures into timeline intervals. Summary rows report per-target coverage, counts, date range, discovered paths, subdomains, and emails found in fetched content. Alert rows appear in monitoring mode only when the selected mechanical alert rule is met.
You can export the dataset as JSON, CSV, Excel, XML, RSS, or HTML, or consume the rows through the Apify API, schedules, webhooks, and integrations.
🚀 How to run it
- Add one or more URLs or domains in URLs or domains.
- Choose Archive scope: exact URL, URL prefix, same host, or same domain and subdomains.
- Set optional date, status, and MIME filters.
- Pick an output mode: raw snapshots, changed snapshots, timeline, closest snapshot to date, Markdown report, or monitoring delta.
- Keep Collapse snapshots by on content digest for compact results, or choose every snapshot for full raw history.
- Turn on Fetch archived page text only when you need readable content or phrase search evidence.
- Run the Actor and open the dataset or optional Markdown report.
For a small first run, use:
{"targets": ["example.com"],"matchType": "domain","maxResults": 10,"statusFilter": "200","mimeFilter": "text/html","outputMode": "snapshots","collapseBy": "digest","includeContent": false}
⚙️ Input options
targets is required and accepts up to 50 URLs or domains.
matchType controls how broadly each target is searched. Use exact URL for one page, prefix for a path, host for one hostname, and domain when subdomains should be included.
maxResults limits saved snapshot, version, or alert rows per target. The maximum is 10,000.
dateFrom and dateTo accept YYYY, YYYYMM, or YYYYMMDD. statusFilter accepts a status such as 200 or 404. mimeFilter accepts a type such as text/html.
outputMode changes the result shape. Use raw snapshots for exports, timeline for version intervals, closest snapshot to date for evidence work, report for a Markdown summary, and monitoring for scheduled archive checks.
includeContent, maxContentFetch, and historyQuery control archived text fetching. Content fetching is capped so large archive searches do not fetch every historical page by accident.
🧾 Output example
{"recordType": "snapshot","target": "example.com","originalUrl": "https://example.com/pricing","waybackTimestamp": "20240510123045","archiveDate": "2024-05-10T12:30:45.000Z","statusCode": 200,"mimeType": "text/html","contentDigest": "M5W6TLBPLQWJXTQWJ2R5XQ7Y3YQK4K6L","contentLength": 18432,"contentStatus": "notRequested","content": null,"distanceFromTargetDays": null,"change": {"changed": true,"type": "digestChange","previousArchiveDate": "2024-04-01T08:15:30.000Z","previousWaybackTimestamp": "20240401081530","evidence": ["Digest changed from ABC123 to M5W6TLBPLQWJXTQWJ2R5XQ7Y3YQK4K6L"]},"version": null,"diff": null,"summary": null,"alert": null}
💳 Pricing
This Actor uses pay-per-event pricing. You are charged for each saved successful Wayback result: snapshot, version, summary, or monitoring alert. Empty archive searches, invalid inputs, skipped content fetches, and source issues do not create dataset rows.
The planned pricing starts at $0.0018 per saved archive result on the Free tier and goes down to $0.0009 per saved archive result on higher tiers. Always check the Actor Pricing tab before starting a large run.
⚠️ Limits and caveats
- The Internet Archive may not have snapshots for every page or date.
- A successful run can return zero rows when no matching archive data exists.
- Archive text can be unavailable, non-HTML, capped, or skipped by the content fetch limit.
- Monitoring compares the latest saved archive state for the same target and filters. It is based on Wayback captures, not live website polling.
- Change labels are mechanical and source-backed. They do not claim semantic meaning such as a product, legal, or pricing change unless the returned text evidence shows it.
❓ FAQ
❓ Does this use the official Wayback Machine?
It reads public Internet Archive Wayback Machine data through the CDX index and archived playback pages.
🔑 Do I need a Wayback Machine API key?
No. The Actor does not ask for a Wayback Machine API key, cookies, or login.
📡 Can it monitor a live website?
It monitors changes in Wayback Machine archive captures. It does not poll the current live page independently.
🔗 Why is there no archiveUrl field?
Rows keep originalUrl and waybackTimestamp. A playback URL can be reconstructed as https://web.archive.org/web/{waybackTimestamp}/{originalUrl} when you need to open the archived page.
📝 Changelog
- 0.1: Initial release.
🆘 Support
For issues, questions, or feature requests, file a ticket and I'll fix or implement it in less than 24h 🫡
🔗 Other actors
- Sitemap Sniffer ↗ - Find public sitemap files and URL inventories before an archive or SEO audit.
- Website URL Crawler ↗ - Crawl rendered website links and export a clean link map.
- SEMrush Free Website Stats Scraper ↗ - Collect public domain traffic, authority, backlink, and referral metrics.
- Ahrefs Free Website Stats Scraper ↗ - Export public Ahrefs domain rating, traffic, rank, and linking website stats.
- Font Detector ↗ - Detect fonts, CSS families, and source evidence on public web pages.
Made with ❤️ by Maxime Dupré