Wayback HTML Page History Extractor
Created by
Stas Persiianenko
Extract archived HTML pages for a URL prefix from the Wayback Machine with HTTP 200 filters, timestamps, digests, and replay links.
Wayback Machine CDX Bulk Extractorautomation-lab/wayback-machine-cdx-extractor
Original URL
Timestamp
Status Code
MIME Type
+4 fieldsTextNumberBooleanListObject
Input
URL or domain(required):https://example.com/blog/
Match type:prefix
Max snapshots:5000
From date (YYYYMMDD):20180101
To date (YYYYMMDD):20251231
Filter by status codes:200
Exclude status codes
Filter by MIME types:text/html
Page size:10000
Collapse duplicates:digest
Include Wayback Machine URL:true
Output fields
Original URL
Timestamp
Status Code
MIME Type
Content Digest
Size (bytes)
URL Key
Wayback URL
Sign up on Apify01
Create your Apify account to access the Wayback Machine CDX Bulk Extractor.
Start the run02
The Actor will start running based on the input automatically.
Receive the output03
Monitor the progress in real-time. You will be notified as soon as your dataset is complete and ready for review.
Integrate into your workflow04
The final output is delivered in JSON, CSV, or Excel format, ready to be plugged into your workflow.
