Pricing

$1.00 / 1,000 snapshot retrieveds

Wayback Machine Archive Scraper

Fetch historical snapshots of any webpage from the Internet Archive. Perfect for digital forensics and tracking deleted content.

Pricing

$1.00 / 1,000 snapshot retrieveds

Rating

0.0

(0)

Developer

Andok

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

Wayback Machine Scraper for Historical Snapshots

Retrieve historical web page snapshots from the Internet Archive for compliance checks, competitive due diligence, and content recovery. Feed it a list of URLs and get back every archived snapshot with timestamps, status codes, and archive links — or optionally fetch the full HTML of the latest snapshot. Built on the official Wayback CDX API for accurate, structured results.

Features

Bulk URL processing — check snapshot history for dozens of URLs in a single run
Date range filtering — narrow results to a specific time window with from and to parameters
Deduplication — collapse identical snapshots by digest to reduce noise
Status code filtering — only return snapshots with specific HTTP status codes (default: 200)
HTML retrieval — optionally fetch the archived HTML content for the most recent snapshot
Concurrent processing — configurable parallelism for faster batch runs
Structured metadata — every snapshot includes timestamp, original URL, MIME type, and archive URL

Input

Field	Type	Required	Default	Description
`urls`	`array`	Yes	`["https://example.com"]`	List of URLs to look up in the Wayback Machine
`url`	`string`	No	—	Single URL (backwards compatible, merged with `urls`)
`from`	`string`	No	—	Start date for snapshot range (format: `YYYY` or `YYYYMMDDhhmmss`)
`to`	`string`	No	—	End date for snapshot range (format: `YYYY` or `YYYYMMDDhhmmss`)
`limit`	`integer`	No	`50`	Maximum snapshots to return per URL (1-5000)
`collapse`	`string`	No	`digest`	Collapse parameter to deduplicate snapshots (e.g. `digest`, `timestamp:8`)
`filterStatus`	`string`	No	`statuscode:200`	HTTP status filter for snapshots (e.g. `statuscode:200`)
`includeHtml`	`boolean`	No	`false`	Fetch the archived HTML content for the latest snapshot (experimental)
`timeoutSeconds`	`integer`	No	`20`	Per-request timeout in seconds (1-120)
`concurrency`	`integer`	No	`5`	Number of URLs to process in parallel (1-25)

Input Example

{
  "urls": ["https://example.com", "https://news.ycombinator.com"],
  "from": "2023",
  "to": "2025",
  "limit": 10,
  "includeHtml": false
}

Output

Each dataset item represents one input URL with its snapshot history. Key fields:

inputUrl (string) — the URL that was looked up
snapshotCount (number) — total number of matching snapshots found
snapshots (array) — list of snapshot objects with timestamp, original, statuscode, mimetype, length, and archiveUrl
latestSnapshot (object) — the most recent snapshot, or null if none found
latestHtml (string) — archived HTML content (only when includeHtml is enabled)
checkedAt (string) — ISO timestamp of when the check was performed
error (string) — error message if the lookup failed, otherwise null

Output Example

{
  "inputUrl": "https://example.com",
  "snapshotCount": 3,
  "snapshots": [
    {
      "timestamp": "20250110153022",
      "original": "https://example.com",
      "statuscode": 200,
      "mimetype": "text/html",
      "length": 1256,
      "archiveUrl": "https://web.archive.org/web/20250110153022/https://example.com"
    }
  ],
  "latestSnapshot": {
    "timestamp": "20250110153022",
    "original": "https://example.com",
    "statuscode": 200,
    "mimetype": "text/html",
    "length": 1256,
    "archiveUrl": "https://web.archive.org/web/20250110153022/https://example.com"
  },
  "latestHtml": null,
  "checkedAt": "2025-01-20T12:00:00.000Z",
  "error": null
}

Pricing

Event	Cost
Snapshot Retrieved	Pay-per-event (see actor pricing page)

Use Cases

Compliance & legal — retrieve historical versions of terms of service, privacy policies, or product pages
Competitive due diligence — review how a competitor's website evolved over time before a deal or partnership
Content recovery — recover lost or deleted web pages from the Internet Archive
SEO auditing — check when a page was last crawled and compare historical content changes
Brand monitoring — verify historical claims or track how a brand's messaging changed
Research & journalism — access archived versions of news articles or government pages

Actor	What it adds
Google News Scraper	Monitor current news coverage alongside historical archive lookups
Broken Links Checker	Find dead links on your site, then recover them via Wayback Machine
Sitemap Extractor	Extract all URLs from a sitemap to feed into bulk Wayback lookups

Notes

The Wayback Machine CDX API is free but may throttle under heavy load. Use the concurrency setting conservatively for large batches.
The includeHtml option is experimental and may fail for very large pages or pages with complex JavaScript rendering.

✨ Free Youtube Playlist Scraper

toludare/youtube-playlist-scraper

Your all-in-one tool for extracting data from YouTube playlists, including podcasts, courses, and releases. Retrieve rich details including titles, descriptions, thumbnails, full video and channel metadata, and engagement statistics.

tolu.

216

YouTube Metadata Scraper

scrapier/youtube-metadata-scraper

Scrape comprehensive YouTube video data with the YouTube Metadata Scraper. Extract titles, descriptions, tags, views, likes, comments, upload dates, and more. Perfect for SEO, content analysis, trend tracking, and research. Fast, accurate, and scalable for single or bulk videos.

Scrapier

Wayback Machine CDX Bulk Extractor

automation-lab/wayback-machine-cdx-extractor

Bulk extract archived snapshot metadata from the Wayback Machine CDX API. Get every crawled URL, timestamp, HTTP status code, MIME type, and content digest for any domain or URL pattern. Export to JSON, CSV, or Excel.

Stas Persiianenko

Youtube Video Finder

coregent/youtube-video-finder

Fast YouTube video discovery tool optimized for speed and minimal data extraction. Extract 10 essential discovery fields to quickly identify relevant videos for deeper analysis. No residential proxy required.

Delowar Munna

212

Fast YouTube Playlist Scraper API | Extract Videos & Metadata

apidojo/youtube-playlist-scraper

The ultimate solution for detailed YouTube playlist information. Enjoy unmatched speed and thoroughness in both search and direct video retrieval from playlists. Additionally, it's remarkably cost-effective at just $0.50 per 1000 videos!

API Dojo

412

5.0

SoundCloud Scraper

epctex/soundcloud-scraper

Retrieve information from SoundCloud without any restrictions or rate limits. Extremely fast data harvesting about tracks, comments, user profiles, albums, playlists, and more. Download URLs of tracks, monetization methods, number of likes, number of shares, and many more are ready for you.

epctex

337

5.0

Youtube Playlist Extractor

scrapio/youtube-playlist-extractor

This YouTube Playlist Extractor gathers complete playlist data—video links, titles, stats, thumbnails, and more. Great for creators, analysts, educators, and marketers who need organized video lists for reports, automation, or content planning.

Scrapio

Youtube Playlist Scraper

crawlerbros/youtube-playlist-scraper

Scrape all videos from YouTube playlists. Get playlist metadata and complete video listings including titles, durations, thumbnails, and position in playlist.

Crawler Bros

5.0

Metadata Extractor

jancurn/extract-metadata

A small efficient actor that loads a web page, parses its HTML using Cheerio library and extracts the following meta-data from the <HEAD> tag, such as page title, description, author etc.

Jan Čurn

1.4K

YouTube Video Details Scraper

deanter/youtube-video-details-scraper

Input: YouTube video link | 👉 Output: Transcript, video description, video title 👌👉 This actor processes a YouTube video link and extracts the transcript, description, and title of the video. It's perfect for gathering video metadata and subtitles for further analysis or content creation.