# Internet Archive Search Scraper (`crawlerbros/internet-archive-search-scraper`) Actor

Searches and retrieves items from the Internet Archive (archive.org) - 44M+ books, videos, audio, software, and web archives. Free, no API key required.

- **URL**: https://apify.com/crawlerbros/internet-archive-search-scraper.md
- **Developed by:** [Crawler Bros](https://apify.com/crawlerbros) (community)
- **Categories:** Automation, Integrations, Other
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Internet Archive Search Scraper

Search and retrieve items from the **Internet Archive** (archive.org) — the world's largest digital library with 44M+ books, videos, audio recordings, software, and web archives. Free, no API key required.

### What does this actor do?

This actor lets you:
- **Search** the entire Internet Archive by keyword with filters for media type, collection, language, date range, and sort order.
- **Fetch specific items** by their unique Archive.org identifiers, getting enriched metadata including file counts and item sizes.

### Data Source

All data is retrieved from the [Internet Archive](https://archive.org) public API:
- **Advanced Search API**: `https://archive.org/advancedsearch.php` — free, no authentication required.
- **Metadata API**: `https://archive.org/metadata/{identifier}` — free, no authentication required.

### Input

| Field | Type | Description |
|-------|------|-------------|
| `mode` | Select | `search` (default) or `byIdentifiers` |
| `query` | String | Search keywords (e.g. "public domain books", "jazz music") |
| `mediaType` | Select | Filter by type: texts, audio, movies, software, image, etree, data, web, collection, account |
| `collection` | String | Filter by collection slug (e.g. "gutenbergbooks", "librivoxaudio", "prelinger") |
| `language` | String | Filter by language code (e.g. "eng", "fra", "spa") |
| `dateFrom` | String | Start date filter (YYYY or YYYY-MM-DD) |
| `dateTo` | String | End date filter (YYYY or YYYY-MM-DD) |
| `sortBy` | Select | Sort order: most downloaded, newest, oldest, or alphabetical |
| `identifiers` | Array | Specific Archive.org identifiers (for byIdentifiers mode) |
| `maxItems` | Integer | Max items to return (default: 50, max: 5000) |

#### Example Inputs

**Search for classic literature texts:**
```json
{
  "mode": "search",
  "query": "shakespeare",
  "mediaType": "texts",
  "language": "eng",
  "maxItems": 25
}
````

**Fetch specific items by identifier:**

```json
{
  "mode": "byIdentifiers",
  "identifiers": ["gutenberg-hamlet", "adventures_of_huckleberry_finn_librivox"],
  "maxItems": 10
}
```

**Search for audio recordings in a date range:**

```json
{
  "mode": "search",
  "query": "blues music",
  "mediaType": "audio",
  "dateFrom": "1920",
  "dateTo": "1960",
  "sortBy": "-publicdate",
  "maxItems": 100
}
```

### Output

Each item in the dataset contains:

| Field | Description |
|-------|-------------|
| `identifier` | Unique Archive.org identifier |
| `url` | Direct URL to the item page (archive.org/details/{identifier}) |
| `title` | Item title |
| `description` | Item description |
| `creator` | Author or creator |
| `date` | Creation or publication date |
| `mediatype` | Type of media (texts, audio, movies, etc.) |
| `collection` | Collection it belongs to |
| `language` | Language code(s) |
| `subject` | Subject tags (up to 10) |
| `format` | File format(s) (up to 5) |
| `downloads` | Total download count |
| `files_count` | Number of files in the item (byIdentifiers mode) |
| `item_size` | Total size in bytes (byIdentifiers mode) |
| `server` | Serving server hostname (byIdentifiers mode) |
| `scrapedAt` | ISO 8601 timestamp of when data was scraped |

#### Example Output

```json
{
  "identifier": "gutenberg-hamlet",
  "url": "https://archive.org/details/gutenberg-hamlet",
  "title": "Hamlet",
  "description": "A classic tragedy by William Shakespeare",
  "creator": "William Shakespeare",
  "date": "1603",
  "mediatype": "texts",
  "collection": "gutenbergbooks",
  "language": "eng",
  "subject": ["drama", "tragedy", "Shakespeare"],
  "format": ["PDF", "EPUB", "Plain Text"],
  "downloads": 85432,
  "scrapedAt": "2026-01-15T10:30:00+00:00"
}
```

### Frequently Asked Questions

**Is this free to use?**
Yes. The Internet Archive provides a completely free public API with no authentication required.

**How many items can I retrieve?**
Up to 5,000 items per run using the `maxItems` parameter.

**What media types are available?**
Texts (books), Audio, Movies/Video, Software, Images, Live Music (etree), Data sets, Web Archives, and Collections.

**Can I filter by collection?**
Yes — use the `collection` field with a collection slug (e.g. "gutenbergbooks" for Project Gutenberg books, "librivoxaudio" for LibriVox audiobooks, "prelinger" for Prelinger Archives films).

**Can I search in specific languages?**
Yes — use ISO 639-3 language codes like "eng" (English), "fra" (French), "spa" (Spanish), "deu" (German).

**What are identifiers?**
Every Internet Archive item has a unique identifier (e.g. "gutenberg-hamlet"). You can find these in Archive.org URLs: `archive.org/details/{identifier}`.

**How is the data rate-limited?**
The actor adds a 0.3s delay between search pages and 0.5s between metadata requests to respect the API's guidelines.

### Use Cases

- Building digital library catalogs
- Research on public domain content
- Finding historical audio/video recordings
- Locating old software for preservation research
- Downloading metadata for academic research
- Tracking download statistics for archive items

# Actor input Schema

## `mode` (type: `string`):

What to fetch: search across all collections, or fetch specific items by identifier.

## `query` (type: `string`):

Free-text search query (e.g. "shakespeare", "jazz music", "public domain books"). Used in mode=search.

## `mediaType` (type: `string`):

Filter results to a specific media type.

## `collection` (type: `string`):

Filter by a specific collection (e.g. "gutenbergbooks", "librivoxaudio", "prelinger"). Leave blank for all.

## `language` (type: `string`):

Filter by language code (e.g. "eng", "fra", "spa", "deu"). Leave blank for all languages.

## `dateFrom` (type: `string`):

Return items dated from this date (e.g. "1900", "1920-01-01").

## `dateTo` (type: `string`):

Return items dated up to this date (e.g. "2000", "1999-12-31").

## `sortBy` (type: `string`):

How to sort search results.

## `identifiers` (type: `array`):

Specific Internet Archive identifiers to fetch (e.g. \["gutenberg-the-great-gatsby", "adventures\_of\_huckleberry\_finn\_librivox"]).

## `maxItems` (type: `integer`):

Maximum number of items to return.

## Actor input object example

```json
{
  "mode": "search",
  "query": "public domain books",
  "mediaType": "texts",
  "sortBy": "-downloads",
  "identifiers": [],
  "maxItems": 50
}
```

# Actor output Schema

## `items` (type: `string`):

Dataset containing all scraped Internet Archive items.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "mode": "search",
    "query": "public domain books",
    "mediaType": "texts",
    "sortBy": "-downloads",
    "identifiers": [],
    "maxItems": 50
};

// Run the Actor and wait for it to finish
const run = await client.actor("crawlerbros/internet-archive-search-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "mode": "search",
    "query": "public domain books",
    "mediaType": "texts",
    "sortBy": "-downloads",
    "identifiers": [],
    "maxItems": 50,
}

# Run the Actor and wait for it to finish
run = client.actor("crawlerbros/internet-archive-search-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "mode": "search",
  "query": "public domain books",
  "mediaType": "texts",
  "sortBy": "-downloads",
  "identifiers": [],
  "maxItems": 50
}' |
apify call crawlerbros/internet-archive-search-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=crawlerbros/internet-archive-search-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Internet Archive Search Scraper",
        "description": "Searches and retrieves items from the Internet Archive (archive.org) - 44M+ books, videos, audio, software, and web archives. Free, no API key required.",
        "version": "1.0",
        "x-build-id": "CKU8QBTefNDZIzThc"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/crawlerbros~internet-archive-search-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-crawlerbros-internet-archive-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/crawlerbros~internet-archive-search-scraper/runs": {
            "post": {
                "operationId": "runs-sync-crawlerbros-internet-archive-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/crawlerbros~internet-archive-search-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-crawlerbros-internet-archive-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "mode"
                ],
                "properties": {
                    "mode": {
                        "title": "Mode",
                        "enum": [
                            "search",
                            "byIdentifiers"
                        ],
                        "type": "string",
                        "description": "What to fetch: search across all collections, or fetch specific items by identifier.",
                        "default": "search"
                    },
                    "query": {
                        "title": "Search query",
                        "type": "string",
                        "description": "Free-text search query (e.g. \"shakespeare\", \"jazz music\", \"public domain books\"). Used in mode=search.",
                        "default": "public domain books"
                    },
                    "mediaType": {
                        "title": "Media type",
                        "enum": [
                            "",
                            "texts",
                            "audio",
                            "movies",
                            "software",
                            "image",
                            "etree",
                            "data",
                            "web",
                            "collection",
                            "account"
                        ],
                        "type": "string",
                        "description": "Filter results to a specific media type.",
                        "default": ""
                    },
                    "collection": {
                        "title": "Collection",
                        "type": "string",
                        "description": "Filter by a specific collection (e.g. \"gutenbergbooks\", \"librivoxaudio\", \"prelinger\"). Leave blank for all."
                    },
                    "language": {
                        "title": "Language",
                        "type": "string",
                        "description": "Filter by language code (e.g. \"eng\", \"fra\", \"spa\", \"deu\"). Leave blank for all languages."
                    },
                    "dateFrom": {
                        "title": "Date from (YYYY or YYYY-MM-DD)",
                        "type": "string",
                        "description": "Return items dated from this date (e.g. \"1900\", \"1920-01-01\")."
                    },
                    "dateTo": {
                        "title": "Date to (YYYY or YYYY-MM-DD)",
                        "type": "string",
                        "description": "Return items dated up to this date (e.g. \"2000\", \"1999-12-31\")."
                    },
                    "sortBy": {
                        "title": "Sort by",
                        "enum": [
                            "-downloads",
                            "-publicdate",
                            "publicdate",
                            "date asc",
                            "titleSorter asc"
                        ],
                        "type": "string",
                        "description": "How to sort search results.",
                        "default": "-downloads"
                    },
                    "identifiers": {
                        "title": "Identifiers (mode=byIdentifiers)",
                        "type": "array",
                        "description": "Specific Internet Archive identifiers to fetch (e.g. [\"gutenberg-the-great-gatsby\", \"adventures_of_huckleberry_finn_librivox\"]).",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxItems": {
                        "title": "Max items",
                        "minimum": 1,
                        "maximum": 5000,
                        "type": "integer",
                        "description": "Maximum number of items to return.",
                        "default": 50
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
