# Wayback Machine Scraper — Extract Historical Website Snapshots (`knotless_cadence/wayback-machine-scraper`) Actor

Access archived versions of any webpage via Wayback Machine. Get historical snapshots, track website changes over time. Perfect for competitive intelligence and content recovery.

- **URL**: https://apify.com/knotless\_cadence/wayback-machine-scraper.md
- **Developed by:** [Alex](https://apify.com/knotless_cadence) (community)
- **Categories:** Developer tools
- **Stats:** 3 total users, 0 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Wayback Machine Scraper — Website History & Archived Snapshots

Retrieve historical snapshots of any website from the Internet Archive's Wayback Machine. Find all archived versions with timestamps, HTTP status codes, MIME types, and direct archive URLs. Filter by date range to focus on specific time periods.

### Features

- **Snapshot Discovery** — find all archived versions of any URL in the Wayback Machine
- **Date Range Filtering** — narrow results to a specific time period (from/to dates)
- **Archive URLs** — direct links to each archived snapshot for instant access
- **Status Codes** — HTTP status code for each snapshot (200, 301, 404, etc.)
- **MIME Types & Sizes** — content type and byte size of each archived page
- **Deduplication** — collapse parameter removes near-duplicate snapshots (configurable)
- **Up to 1000 Snapshots** — retrieve extensive history per URL with configurable limits

### Output Example

```json
{
  "url": "google.com",
  "timestamp": "20260101120000",
  "dateISO": "2026-01-01T12:00:00Z",
  "statusCode": 200,
  "mimeType": "text/html",
  "size": 15234,
  "digest": "ABC123DEF456",
  "archiveUrl": "https://web.archive.org/web/20260101120000/google.com",
  "inputUrl": "google.com",
  "scrapedAt": "2026-03-18T10:00:00.000Z"
}
````

### Use Cases

- **Competitive Analysis** — track how competitor websites evolved over time (design, messaging, pricing)
- **Brand Monitoring** — verify historical claims and content changes on any website
- **SEO Research** — analyze how page structure and content changed relative to ranking shifts
- **Legal & Compliance** — document website content at specific dates for evidence purposes
- **Content Recovery** — find and recover deleted pages, blog posts, or product listings
- **Market Research** — study how industry landing pages and value propositions evolved

### Input Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `urls` | array | `[]` | URLs to look up (e.g., "google.com", "apple.com") |
| `maxSnapshotsPerUrl` | integer | `20` | Max snapshots per URL (1-1000) |
| `fromDate` | string | `""` | Start date filter (YYYY-MM-DD) |
| `toDate` | string | `""` | End date filter (YYYY-MM-DD) |

### How It Works

The scraper queries the Wayback Machine's CDX Server API, which indexes all archived snapshots in the Internet Archive. It retrieves snapshot metadata including timestamps, status codes, and content digests, then constructs direct archive URLs for each result. The collapse parameter deduplicates snapshots taken within the same time window.

# Actor input Schema

## `urls` (type: `array`):

Website URLs to look up in the Wayback Machine (e.g., 'google.com', 'apple.com')

## `maxSnapshotsPerUrl` (type: `integer`):

Maximum number of archived snapshots to retrieve per URL

## `fromDate` (type: `string`):

Start date filter (YYYY-MM-DD format, e.g., '2020-01-01')

## `toDate` (type: `string`):

End date filter (YYYY-MM-DD format)

## Actor input object example

```json
{
  "urls": [
    "google.com"
  ],
  "maxSnapshotsPerUrl": 20,
  "fromDate": "",
  "toDate": ""
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        "google.com"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("knotless_cadence/wayback-machine-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "urls": ["google.com"] }

# Run the Actor and wait for it to finish
run = client.actor("knotless_cadence/wayback-machine-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    "google.com"
  ]
}' |
apify call knotless_cadence/wayback-machine-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=knotless_cadence/wayback-machine-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Wayback Machine Scraper — Extract Historical Website Snapshots",
        "description": "Access archived versions of any webpage via Wayback Machine. Get historical snapshots, track website changes over time. Perfect for competitive intelligence and content recovery.",
        "version": "1.0",
        "x-build-id": "IE6pLlaeStbuyL4B2"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/knotless_cadence~wayback-machine-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-knotless_cadence-wayback-machine-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/knotless_cadence~wayback-machine-scraper/runs": {
            "post": {
                "operationId": "runs-sync-knotless_cadence-wayback-machine-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/knotless_cadence~wayback-machine-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-knotless_cadence-wayback-machine-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "urls"
                ],
                "properties": {
                    "urls": {
                        "title": "URLs",
                        "type": "array",
                        "description": "Website URLs to look up in the Wayback Machine (e.g., 'google.com', 'apple.com')",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxSnapshotsPerUrl": {
                        "title": "Max Snapshots",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of archived snapshots to retrieve per URL",
                        "default": 20
                    },
                    "fromDate": {
                        "title": "From Date",
                        "type": "string",
                        "description": "Start date filter (YYYY-MM-DD format, e.g., '2020-01-01')",
                        "default": ""
                    },
                    "toDate": {
                        "title": "To Date",
                        "type": "string",
                        "description": "End date filter (YYYY-MM-DD format)",
                        "default": ""
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
