# Wayback HTML Page History Extractor

**Use case:** 

Extract archived HTML pages for a URL prefix from the Wayback Machine with HTTP 200 filters, timestamps, digests, and replay links.

## Input

```json
{
  "url": "https://example.com/blog/",
  "matchType": "prefix",
  "maxSnapshots": 5000,
  "fromDate": "20180101",
  "toDate": "20251231",
  "filterStatusCodes": [
    200
  ],
  "excludeStatusCodes": [],
  "filterMimeTypes": [
    "text/html"
  ],
  "pageSize": 10000,
  "collapse": "digest",
  "outputWaybackUrl": true
}
```

## Output

```json
{
  "originalUrl": {
    "label": "Original URL",
    "format": "link"
  },
  "timestamp": {
    "label": "Timestamp"
  },
  "statusCode": {
    "label": "Status Code"
  },
  "mimeType": {
    "label": "MIME Type"
  },
  "digest": {
    "label": "Content Digest"
  },
  "length": {
    "label": "Size (bytes)",
    "format": "number"
  },
  "urlKey": {
    "label": "URL Key"
  },
  "waybackUrl": {
    "label": "Wayback URL",
    "format": "link"
  }
}
```

## About this Actor

This example demonstrates how to use [Wayback Machine CDX Bulk Extractor](https://apify.com/automation-lab/wayback-machine-cdx-extractor) with a specific input configuration. Visit the [Actor detail page](https://apify.com/automation-lab/wayback-machine-cdx-extractor) to learn more, explore other use cases, and run it yourself.