Pricing

Pay per usage

Wayback Snapshots — CSV, Date-Filter, Bulk JSON

Wayback Machine snapshots in CSV/JSON — per snapshot: timestamp, status, MIME, size, archive URL — date-filterable + collapse-by-day. Uses CDX API, no API key. Built for competitive intel, SEO recovery, content audits. spinov001@gmail.com · t.me/scraping_ai

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Actor stats

Bookmarked

Total users

Monthly active users

3 hours ago

Last modified

Wayback Machine Scraper — Extract Historical Website Snapshots

Retrieve archived versions of any website from the Internet Archive. See how any URL looked at any point in history — no API key, no rate-limit headaches, no HTML parsing breakage.

Why This Scraper?

Most "Wayback" scrapers scrape the web.archive.org HTML directly, which breaks whenever the archive UI changes. This actor uses the official CDX Server API that archive.org exposes for programmatic access — the same endpoint used by researchers and journalists worldwide. That means:

✅ Never breaks on UI changes — CDX API is stable and documented
✅ No authentication — public archive, no credentials needed
✅ Bulk lookups — submit hundreds of URLs in a single run
✅ Structured output — clean JSON/CSV, ready for analysis pipelines
✅ Full HTML retrieval — optionally pull the cached page body, not just metadata

Features

Historical snapshots — full timeline of cached versions per URL
Date filtering — narrow to a year, month, or custom range
Bulk processing — 100s of URLs per run, automatic deduplication
Content extraction — pull cached HTML/text, not just metadata
Status code filtering — skip 404/redirect snapshots, keep only 200
MIME filtering — HTML only, or include images/PDFs/JSON
Proxy support — uses Apify Proxy for reliable access at volume

Output Data

{
  "url": "https://example.com",
  "snapshotDate": "2024-06-15T08:30:00Z",
  "statusCode": 200,
  "mimeType": "text/html",
  "contentLength": 45230,
  "archiveUrl": "https://web.archive.org/web/20240615083000/https://example.com",
  "title": "Example Domain",
  "htmlContent": "<!doctype html>..."
}

Use Cases

Competitive intelligence — track how competitors changed pricing, messaging, and features over months or years
Legal / compliance evidence — document historical website state for disputes, IP claims, or regulatory filings
SEO research — analyze how page structure, titles, meta tags, and internal linking evolved
Content recovery — rescue pages that were deleted, redesigned, or moved
Brand monitoring — visualize a company's public image shift over time
Journalism & fact-checking — verify what was published on a date, with an auditable source

Integration Examples

Python

import requests

response = requests.post(
    "https://api.apify.com/v2/acts/knotless_cadence~wayback-machine-scraper/runs",
    params={"token": "YOUR_APIFY_TOKEN"},
    json={"urls": ["https://example.com"], "fromDate": "2020-01-01", "toDate": "2025-12-31"}
)
run_id = response.json()["data"]["id"]

# Poll for completion, then fetch dataset
items = requests.get(
    f"https://api.apify.com/v2/actor-runs/{run_id}/dataset/items",
    params={"token": "YOUR_APIFY_TOKEN"}
).json()

for snap in items:
    print(f"[{snap['snapshotDate']}] {snap['title']}")

n8n Workflow

HTTP Request → POST https://api.apify.com/v2/acts/knotless_cadence~wayback-machine-scraper/runs
Wait → 60 seconds (or loop-poll on run status)
HTTP Request → GET .../dataset/items
Slack / Google Sheets / Postgres → push snapshots to your system

Pricing

Volume	Estimated Cost
100 URLs, metadata only	~$0.30
100 URLs + full HTML	~$0.80
500 URLs + full HTML	~$3.00

Runs on Apify's free tier (with datacenter-proxy limitations).

FAQ

Q: Is this legal? A: Yes. The Wayback Machine is a public archive operated by the non-profit Internet Archive. This actor reads publicly-available data and respects robots.txt and archive.org's access guidelines.

Q: Why use this over scraping web.archive.org's HTML directly? A: The CDX Server API is the officially-supported programmatic interface. HTML scrapers break every time the UI is tweaked — this one doesn't.

Q: Can I pull the full page HTML, not just metadata? A: Yes. Set extractContent: true in the input schema.

Q: How many snapshots are available for a given URL? A: Depends on the site. Popular sites (e.g., nytimes.com) may have tens of thousands. Small sites may have a handful.

Q: How fast is it? A: Typically 100 URL lookups (metadata only) in under 2 minutes. Adding full HTML extraction adds ~3-5 seconds per snapshot.

Q: Can I filter by status code? A: Yes. Filter to 200 only to skip redirects and 404s.

Trustpilot Review Scraper — 249+ runs, ratings & sentiment
Reddit Scraper Pro — 72+ runs, posts & comment trees via Reddit JSON API
Google News Scraper — Track news mentions and media coverage
Email Extractor Pro — Bulk email extraction from websites

Need a Custom Scraper or Data Pipeline?

Get a tailored scraper built for YOUR use case in 48 hours — $100 pilot rate, or $150 for a 3-article series if you also need written deliverables.

Email: spinov001@gmail.com Portfolio: 78 published Apify actors — Trustpilot 249+ runs, Reddit 72+, Google News 32+, Email Extractor 19+ Tips & tutorials: t.me/scraping_ai

Trustpilot Review Scraper — Unlimited Reviews, Bypass 200 Limit

knotless_cadence/trustpilot-review-scraper

Get YOUR Trustpilot reviews in CSV in 2 minutes — 559+ real runs from teams worldwide. No coding, no API waitlist, no bans. Rating, date, verified-purchase, sentiment. Custom pipeline — email spinov001@gmail.com. Tips: t.me/scraping_ai

Alex

Bluesky Scraper — Posts, Followers & Profiles [No API Limits]

knotless_cadence/bluesky-scraper

Get YOUR Bluesky posts, profiles & feeds in CSV in 2 min — 17+ real runs worldwide. No API waitlist, no rate limits, no bans. Text, images, likes, reposts, profile metadata from the decentralized network. Custom pipeline — spinov001@gmail.com · Tips: t.me/scraping_ai

Alex

Meta Threads Scraper — CSV, No Login, No Rate Limits

knotless_cadence/threads-scraper

Get YOUR Threads (Meta) data as JSON/CSV in 2 min — posts, users, likes, replies, hashtags by keyword or username. 11+ real runs worldwide. No API waitlist, no rate limits. Custom pipeline — spinov001@gmail.com · Tips: t.me/scraping_ai

Alex

Reddit Scraper Pro — No API Key, Unlimited Posts & Comments

knotless_cadence/reddit-scraper-pro

Scrape YOUR subreddit in minutes — 74+ real runs from growth teams + researchers. No Reddit API waitlist, no 60/min cap, no shadowban risk. Posts+comments+scores+author karma as CSV. Custom pipeline — email spinov001@gmail.com. Tips: t.me/scraping_ai

Alex

Booking.com Scraper — Hotels, Prices, Reviews, CSV, No API Key

knotless_cadence/booking-com-scraper

Get YOUR Booking.com hotel data as structured JSON/CSV — prices, ratings, reviews, amenities, availability for any destinations + dates. 15+ real runs worldwide. No API waitlist, no rate limits. Custom schema fork — spinov001@gmail.com · Tips: t.me/scraping_ai

Alex

YouTube Comments — CSV, Sentiment, No API Key, No Cap

knotless_cadence/youtube-comments-scraper

Get YOUR YouTube comments in CSV/JSON — text, author, likes, replies, timestamps — for any video or playlist. 14+ real runs worldwide. Built for sentiment-analysis + ML training pipelines. No API quotas, no limits. Custom pipeline — spinov001@gmail.com · Tips: t.me/scraping_ai

Alex

Glassdoor Scraper — Reviews, Salaries, CSV, No Login Required

knotless_cadence/glassdoor-reviews-scraper

Get Glassdoor reviews + salary data in CSV/JSON in 5 min — no coding, no login, no rate-limits. 30+ real runs by HR + competitive-intel teams. Schema: ratings, pros/cons, titles, dates, salary. Custom pipeline? Email spinov001@gmail.com · Tips: t.me/scraping_ai

Alex

Country Info Scraper — Get Population, GDP, Capital Data (JSON)

knotless_cadence/country-info-scraper

Get YOUR country data in JSON/CSV in 2 min — 8+ real runs worldwide. Population, GDP, capital, languages, currencies, borders across 250+ countries. No API keys, no rate limits, no bans. Bulk export. Custom pipeline — spinov001@gmail.com · Tips: t.me/scraping_ai

Alex

Social Profiles — Bio + Followers + Posts in CSV, No Limits

knotless_cadence/social-profile-scraper

Get YOUR social profile data in CSV/JSON — bio, followers, posts, engagement across multiple platforms in bulk. 14+ real runs worldwide. Built for competitor tracking + influencer research + lead enrichment. No rate limits. Custom pipeline — spinov001@gmail.com · Tips: t.me/scraping_ai

Alex

Yelp Scraper — Reviews, Ratings, Contacts, CSV, No API Key

knotless_cadence/yelp-business-scraper

Get YOUR Yelp business leads as CSV/JSON in 2 min — 10+ real runs. No paid API, no copy-paste, no rate limits. Name, address, phone, rating, reviews, hours, photos — by keyword+city. Built for local-biz prospecting + SMB lead-gen. Custom pipeline — spinov001@gmail.com · Tips: t.me/scraping_ai

Alex