# Internet Archive Search Scraper (`parseforge/internet-archive-search-scraper`) Actor

Search the Internet Archive's 50M+ item catalog of texts, audio, movies, software, web pages, and images. Filter by collection, media type, creator, and date. Pull identifiers, titles, descriptions, downloads, and rich metadata.

- **URL**: https://apify.com/parseforge/internet-archive-search-scraper.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** Education, Other, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $16.00 / 1,000 result items

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://github.com/ParseForge/apify-assets/blob/ad35ccc13ddd068b9d6cba33f323962e39aed5b2/banner.jpg?raw=true)

## 📚 Internet Archive Search Scraper

> 🚀 **Export the world's largest open library in seconds.** Search **50M+ items** across texts, audio, movies, software, web, images, and data. No login, no manual paging, no Lucene crash courses required.

> 🕒 **Last updated:** 2026-05-22 · **📊 21 fields** per record · **📚 50M+ items** · **🎬 8 media types** · **🌐 archive.org corpus**

The **Internet Archive Search Scraper** exports the open library catalog and returns **21 fields per record**, including identifier, title, full description, creator, language, subject tags, collection memberships, publish date, lifetime and weekly download counts, file inventories, total byte size, license URL, and direct links to the item details page and metadata feed. The underlying source is the world's largest publicly accessible digital library, maintained since 1996.

The catalog covers **50 million+ items across eight media types** (texts, audio, movies, software, web captures, images, datasets, and collections). This Actor lets you slice the corpus with Lucene-style queries plus structured filters for collection, media type, creator, and date range, then download the result as CSV, Excel, JSON, or XML in under five minutes.

| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Librarians, digital archivists, journalists, OSINT researchers, academic historians, ML dataset curators, documentary filmmakers | Citation discovery, training-corpus assembly, historical media research, source verification, public-domain media sourcing, archival preservation audits |

---

### 📋 What the Archive Search Scraper does

A single configurable workflow with four filter layers:

- 🔎 **Lucene query.** Free-text or fielded queries like `subject:photography AND mediatype:image`.
- 📦 **Collection filter.** Restrict to one Internet Archive collection like `nasa` or `librivoxaudio`.
- 🎬 **Media-type filter.** Texts, audio, movies, software, web, image, data, or collection.
- 📅 **Date range.** Filter by item publish date with `dateFrom` and `dateTo` (YYYY-MM-DD).
- 📚 **Per-item metadata.** Optional deep fetch returns the full file list, rich subject tags, and license URL.

Each record bundles identifiers (Archive ID, details URL, metadata URL), descriptive metadata (title, creator, language, description, subject tags), classification (media type, collection memberships), engagement (lifetime, weekly, and monthly download counts), file inventory (count and total byte size), and licensing.

> 💡 **Why it matters:** the Archive is the largest public corpus of cultural and reference material on Earth, but its native search interface assumes you already know Lucene. This Actor exposes that same query layer with structured filters and clean records, ready for analysis, ingestion, or archival back-up.

---

### 🎬 Full Demo

_🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset._

---

### ⚙️ Input

<table>
<thead>
<tr><th>Input</th><th>Type</th><th>Default</th><th>Behavior</th></tr>
</thead>
<tbody>
<tr><td><code>maxItems</code></td><td>integer</td><td><code>10</code></td><td>Records to return. Free plan caps at 10, paid plan at 1,000,000.</td></tr>
<tr><td><code>searchQuery</code></td><td>string</td><td><code>"mars rover"</code></td><td>Lucene-style query. Plain words OK. Field syntax like <code>creator:NASA</code> supported.</td></tr>
<tr><td><code>collection</code></td><td>string</td><td><code>""</code></td><td>Restrict to one collection slug like <code>nasa</code>, <code>opensource_movies</code>, or <code>librivoxaudio</code>.</td></tr>
<tr><td><code>mediaType</code></td><td>enum</td><td><code>""</code></td><td>One of 8 media types: texts, audio, movies, software, web, image, data, collection.</td></tr>
<tr><td><code>creator</code></td><td>string</td><td><code>""</code></td><td>Filter by creator name like <code>NASA</code> or <code>Library of Congress</code>.</td></tr>
<tr><td><code>dateFrom</code></td><td>string</td><td><code>""</code></td><td>Earliest publish date in YYYY-MM-DD.</td></tr>
<tr><td><code>dateTo</code></td><td>string</td><td><code>""</code></td><td>Latest publish date in YYYY-MM-DD.</td></tr>
<tr><td><code>fetchDetails</code></td><td>boolean</td><td><code>true</code></td><td>Fetch full per-item metadata. Slower but richer.</td></tr>
</tbody>
</table>

**Example: NASA photo collection from 2020 to 2025.**

```json
{
    "maxItems": 100,
    "creator": "NASA",
    "mediaType": "image",
    "dateFrom": "2020-01-01",
    "dateTo": "2025-12-31"
}
````

**Example: classic LibriVox audiobooks.**

```json
{
    "maxItems": 50,
    "collection": "librivoxaudio",
    "mediaType": "audio"
}
```

> ⚠️ **Good to Know:** Lucene field names are case-sensitive (`subject:` not `Subject:`). Enable `fetchDetails` for file inventories and the full subject-tag list, otherwise records contain index-level metadata only. For very large dumps (100,000+ items), schedule the run during off-peak hours to be a good steward of the public catalog.

***

### 📊 Output

Each record contains **21 fields**. Download the dataset as CSV, Excel, JSON, or XML.

#### 🧾 Schema

| Field | Type | Example |
|---|---|---|
| 🖼️ `thumbnailUrl` | string | `"https://archive.org/services/img/PIA23499"` |
| 🆔 `identifier` | string | `"PIA23499"` |
| 📝 `title` | string | `"Mars 2020 Rover Selfie"` |
| 🎬 `mediaType` | string | `"image"` |
| 👤 `creator` | array | `["NASA/JPL-Caltech"]` |
| 📅 `date` | string | `"2021-04-06"` |
| 📅 `publishDate` | ISO 8601 | `"2021-04-06T00:00:00.000Z"` |
| 📖 `description` | string | `"NASA's Perseverance rover took this selfie..."` |
| 📦 `collection` | array | `["nasa", "image"]` |
| 🏷️ `subject` | array | `["mars", "rover", "perseverance"]` |
| 🌐 `language` | array | `["English"]` |
| ⬇️ `downloads` | number | `12450` |
| 📊 `week` | number | `87` |
| 📊 `month` | number | `342` |
| 📁 `filesCount` | number | `8` |
| 💾 `totalSizeBytes` | number | `52428800` |
| 📜 `licenseUrl` | string | null | `"https://creativecommons.org/publicdomain/mark/1.0/"` |
| 🔗 `detailsUrl` | string | `"https://archive.org/details/PIA23499"` |
| 🧾 `metadataUrl` | string | `"https://archive.org/metadata/PIA23499"` |
| 🕒 `scrapedAt` | ISO 8601 | `"2026-05-22T00:00:00.000Z"` |
| ⚠️ `error` | string | null | `null` |

#### 📦 Sample records

<details>
<summary><strong>🛰️ NASA image: Mars 2020 Rover Selfie</strong></summary>

```json
{
    "thumbnailUrl": "https://archive.org/services/img/PIA23499",
    "identifier": "PIA23499",
    "title": "Mars 2020 Rover Selfie",
    "mediaType": "image",
    "creator": ["NASA/JPL-Caltech"],
    "date": "2021-04-06",
    "publishDate": "2021-04-06T00:00:00.000Z",
    "description": "NASA's Perseverance rover took this selfie on Mars next to the Ingenuity helicopter on April 6, 2021.",
    "collection": ["nasa", "image"],
    "subject": ["mars", "rover", "perseverance", "selfie"],
    "language": ["English"],
    "downloads": 12450,
    "week": 87,
    "month": 342,
    "filesCount": 8,
    "totalSizeBytes": 52428800,
    "licenseUrl": "https://creativecommons.org/publicdomain/mark/1.0/",
    "detailsUrl": "https://archive.org/details/PIA23499",
    "metadataUrl": "https://archive.org/metadata/PIA23499",
    "scrapedAt": "2026-05-22T00:00:00.000Z"
}
```

</details>

<details>
<summary><strong>🎧 LibriVox audiobook: Pride and Prejudice</strong></summary>

```json
{
    "thumbnailUrl": "https://archive.org/services/img/pride_prejudice_0711_librivox",
    "identifier": "pride_prejudice_0711_librivox",
    "title": "Pride and Prejudice by Jane Austen",
    "mediaType": "audio",
    "creator": ["Jane Austen"],
    "date": "2011-07-11",
    "publishDate": "2011-07-11T00:00:00.000Z",
    "description": "LibriVox volunteers bring you a free public-domain recording of Jane Austen's classic novel.",
    "collection": ["librivoxaudio", "audio_bookspoetry", "audio"],
    "subject": ["jane austen", "regency", "romance", "classic"],
    "language": ["English"],
    "downloads": 1842311,
    "week": 4521,
    "month": 18934,
    "filesCount": 124,
    "totalSizeBytes": 412348928,
    "licenseUrl": "https://creativecommons.org/publicdomain/zero/1.0/",
    "detailsUrl": "https://archive.org/details/pride_prejudice_0711_librivox",
    "metadataUrl": "https://archive.org/metadata/pride_prejudice_0711_librivox",
    "scrapedAt": "2026-05-22T00:00:00.000Z"
}
```

</details>

<details>
<summary><strong>📜 Historical text: U.S. Constitution facsimile</strong></summary>

```json
{
    "thumbnailUrl": "https://archive.org/services/img/constitutionofun00unit",
    "identifier": "constitutionofun00unit",
    "title": "The Constitution of the United States of America",
    "mediaType": "texts",
    "creator": ["United States"],
    "date": "1787",
    "publishDate": "1787-09-17T00:00:00.000Z",
    "description": "Facsimile and full text of the U.S. Constitution and Bill of Rights.",
    "collection": ["americana", "library_of_congress", "texts"],
    "subject": ["constitution", "law", "founding documents"],
    "language": ["English"],
    "downloads": 524100,
    "week": 211,
    "month": 894,
    "filesCount": 16,
    "totalSizeBytes": 27262976,
    "licenseUrl": "https://creativecommons.org/publicdomain/mark/1.0/",
    "detailsUrl": "https://archive.org/details/constitutionofun00unit",
    "metadataUrl": "https://archive.org/metadata/constitutionofun00unit",
    "scrapedAt": "2026-05-22T00:00:00.000Z"
}
```

</details>

***

### ✨ Why choose this Actor

| | Capability |
|---|---|
| 📚 | **Massive corpus.** Access 50 million+ items across texts, audio, movies, software, web captures, and images. |
| 🔎 | **Lucene-grade search.** Free text or fielded queries, combined with structured filters for collection, media type, creator, and date. |
| 📦 | **Rich per-item metadata.** Optional deep fetch returns file lists, byte sizes, license URLs, and full subject tags. |
| 📊 | **Engagement signals.** Lifetime, weekly, and monthly download counts surface what people actually use. |
| 🌐 | **All media types.** One Actor for texts, audio, movies, software, and data. No source switching. |
| 🔁 | **Always fresh.** Every run reads the live catalog so newly uploaded items flow through. |
| 🚫 | **No authentication.** Public open library. No login or token. |

> 📊 The Archive is the closest thing we have to a public memory of the digital age. Querying it well is a superpower for librarians, journalists, and ML teams.

***

### 📈 How it compares to alternatives

| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| **⭐ Archive Search Scraper** *(this Actor)* | $5 free credit, then pay-per-use | **50M+** items | **Live per run** | query, collection, media, creator, date | ⚡ 2 min |
| Manual archive.org search | Free | Full | Live | Few | 🐢 Hours |
| Commercial library aggregators | $$$/year | Smaller, curated | Daily | Many | ⏳ Days |
| Bulk torrent dumps | Free | Partial, stale | Rarely | None | 🕒 Variable |

Pick this Actor when you want structured Lucene-grade search results, rich metadata, and zero parsing work.

***

### 🚀 How to use

1. 📝 **Sign up.** [Create a free account with $5 credit](https://console.apify.com/sign-up?fpr=vmoqkp) (takes 2 minutes).
2. 🌐 **Open the Actor.** Go to the Internet Archive Search Scraper page on the Apify Store.
3. 🎯 **Set input.** Type a search query, optionally restrict to a collection or media type, and set `maxItems`.
4. 🚀 **Run it.** Click **Start** and let the Actor collect your dataset.
5. 📥 **Download.** Grab results in the **Dataset** tab as CSV, Excel, JSON, or XML.

> ⏱️ Total time from signup to downloaded dataset: **3-5 minutes.** No coding required.

***

### 💼 Business use cases

<table>
<tr>
<td width="50%" valign="top">

#### 📰 Journalism & OSINT

- Source verification with archived primary documents
- Investigative timelines built from historical media
- Public-records research across government collections
- Background dossiers with cited Archive identifiers

</td>
<td width="50%" valign="top">

#### 🧠 ML Dataset Curation

- Assemble training corpora of public-domain texts
- Audio datasets from LibriVox for speech-to-text
- Image datasets from open NASA collections
- Software emulation libraries for retro-computing models

</td>
</tr>
<tr>
<td width="50%" valign="top">

#### 🎬 Documentary & Media Production

- Public-domain footage and audio sourcing
- License-clean B-roll for video editors
- Historical photography for newsroom features
- Archival music for podcast intros

</td>
<td width="50%" valign="top">

#### 📚 Libraries & Archives

- Discover overlapping holdings across collections
- Audit licensing across donated material
- Build curated reading lists by subject
- Preservation-audit support for risk assessment

</td>
</tr>
</table>

***

### 🔌 Automating Archive Search Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

- 🟢 **Node.js.** Install the `apify-client` NPM package.
- 🐍 **Python.** Use the `apify-client` PyPI package.
- 📚 See the [Apify API documentation](https://docs.apify.com/api/v2) for full details.

The [Apify Schedules feature](https://docs.apify.com/platform/schedules) lets you trigger this Actor on any cron interval. Run weekly to track newly added items in a watched collection, or daily during a research sprint.

***

### 🌟 Beyond business use cases

Archive data powers more than commercial workflows. The same records support research, education, civic projects, and personal initiatives.

<table>
<tr>
<td width="50%">

#### 🎓 Research and academia

- Digital humanities studies on historical media
- Reproducible corpus assembly for linguistic research
- Citation networks across archived texts
- Open coursework on archival preservation

</td>
<td width="50%">

#### 🎨 Personal and creative

- Build a personal library of public-domain ebooks
- Hobby projects on retro-software emulation
- Curated music playlists from open audio collections
- Visual research for art and design projects

</td>
</tr>
<tr>
<td width="50%">

#### 🤝 Non-profit and civic

- Volunteer indexing of donated material
- Civic transparency around public records
- Investigative journalism on archived government docs
- Open-knowledge contributions to Wikipedia and Wikisource

</td>
<td width="50%">

#### 🧪 Experimentation

- Train historical-text OCR or topic models
- Validate search-ranking experiments with real queries
- Prototype agent pipelines that cite archival sources
- Seed metadata graphs with collection relationships

</td>
</tr>
</table>

***

### 🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

- 💬 [**ChatGPT**](https://chat.openai.com/?q=How%20do%20I%20use%20the%20Internet%20Archive%20Search%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🧠 [**Claude**](https://claude.ai/new?q=How%20do%20I%20use%20the%20Internet%20Archive%20Search%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🔍 [**Perplexity**](https://perplexity.ai/search?q=How%20do%20I%20use%20the%20Internet%20Archive%20Search%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🅒 [**Copilot**](https://copilot.microsoft.com/?q=How%20do%20I%20use%20the%20Internet%20Archive%20Search%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)

***

### ❓ Frequently Asked Questions

#### 🧩 How does it work?

Type a search query, optionally add collection, media-type, creator, or date filters, click Start, and the Actor pulls structured records from the live catalog. With `fetchDetails` enabled, each record is enriched with the full per-item metadata feed.

#### 📏 How accurate is the data?

Identifiers and detail URLs are stable. Subject tags and creator names are crowd-sourced and may include duplicates or typos. Download counts update continuously and reflect the catalog state at run time.

#### 🔁 How often is the catalog refreshed?

The Archive accepts new uploads every minute of the day. Every run of this Actor reads the live catalog, so freshly uploaded items appear without waiting for a daily cron.

#### 🎬 What media types can I query?

Eight: texts, audio, movies, software, web captures, images, datasets, and collections. Combine with a Lucene query for fine-grained slicing.

#### 🔎 Do I need to know Lucene?

No. Plain-word queries work fine. Fielded syntax like `creator:NASA` or `subject:photography AND mediatype:image` is supported when you need precision.

#### 📜 Are these items free to use?

Most are public domain or open-licensed. Always check the `licenseUrl` field on each record before commercial use. Some collections carry restrictive licenses despite being publicly viewable.

#### 📁 Do I get the actual file downloads?

This Actor returns metadata, file counts, and total byte sizes. To pull individual files, follow the `detailsUrl` and use the per-file download links the Archive exposes there.

#### 💼 Can I use this data commercially?

Yes for metadata. For the actual media, you must respect each item's license. The `licenseUrl` field surfaces the relevant statement on every record.

#### 💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small pulls (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger datasets.

#### 🔁 What happens if a run fails or gets interrupted?

Apify automatically retries transient errors. If a run still fails, inspect the log in the Runs tab, fix the input, and re-run. Partial datasets from failed runs are preserved so you never lose progress.

#### 🆘 What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.

***

### 🔌 Integrate with any app

Internet Archive Search Scraper connects to any cloud service via [Apify integrations](https://apify.com/integrations):

- [**Make**](https://docs.apify.com/platform/integrations/make) - Automate multi-step workflows
- [**Zapier**](https://docs.apify.com/platform/integrations/zapier) - Connect with 5,000+ apps
- [**Slack**](https://docs.apify.com/platform/integrations/slack) - Get run notifications in your channels
- [**Airbyte**](https://docs.apify.com/platform/integrations/airbyte) - Pipe archive records into your warehouse
- [**GitHub**](https://docs.apify.com/platform/integrations/github) - Trigger runs from commits and releases
- [**Google Drive**](https://docs.apify.com/platform/integrations/drive) - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh archival metadata into your knowledge base, or alert a research team in Slack.

***

### 🔗 Recommended Actors

- [**📚 arXiv Scraper**](https://apify.com/parseforge/arxiv-scraper) - Preprint papers across physics, math, and CS
- [**📜 RFC Editor Index Scraper**](https://apify.com/parseforge/rfc-editor-scraper) - IETF Internet standards catalog
- [**🏛️ Met Museum Scraper**](https://apify.com/parseforge/met-museum-scraper) - Metropolitan Museum of Art open-access objects
- [**🔬 ClinicalTrials.gov Scraper**](https://apify.com/parseforge/clinicaltrials-gov-scraper) - Registered medical trials with outcomes
- [**🌍 REST Countries Info Scraper**](https://apify.com/parseforge/restcountries-info-scraper) - 250+ countries with population, currencies, languages

> 💡 **Pro Tip:** browse the complete [ParseForge collection](https://apify.com/parseforge) for more reference-data scrapers.

***

**🆘 Need Help?** [**Open our contact form**](https://tally.so/r/BzdKgA) to request a new scraper, propose a custom data project, or report an issue.

***

> **⚠️ Disclaimer:** this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the Internet Archive or any of its contributors. All trademarks mentioned are the property of their respective owners. Only publicly available open archival data is collected.

# Actor input Schema

## `maxItems` (type: `integer`):

Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000

## `searchQuery` (type: `string`):

Lucene-style query. Examples: 'mars rover', 'subject:photography AND mediatype:image', 'creator:NASA'. See https://archive.org/advancedsearch.php.

## `collection` (type: `string`):

Optional. Restrict to a single Internet Archive collection slug (e.g. 'nasa', 'opensource\_movies', 'librivoxaudio').

## `mediaType` (type: `string`):

Optional. Restrict to one media type.

## `creator` (type: `string`):

Optional. Filter by creator name (e.g. 'NASA', 'Library of Congress').

## `dateFrom` (type: `string`):

Optional. Earliest item date.

## `dateTo` (type: `string`):

Optional. Latest item date.

## `fetchDetails` (type: `boolean`):

When enabled, fetches the full per-item metadata (slower, but includes file listings and rich subject tags). Recommended.

## Actor input object example

```json
{
  "maxItems": 10,
  "searchQuery": "mars rover",
  "fetchDetails": true
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "maxItems": 10,
    "searchQuery": "mars rover"
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/internet-archive-search-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "maxItems": 10,
    "searchQuery": "mars rover",
}

# Run the Actor and wait for it to finish
run = client.actor("parseforge/internet-archive-search-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "maxItems": 10,
  "searchQuery": "mars rover"
}' |
apify call parseforge/internet-archive-search-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/internet-archive-search-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Internet Archive Search Scraper",
        "description": "Search the Internet Archive's 50M+ item catalog of texts, audio, movies, software, web pages, and images. Filter by collection, media type, creator, and date. Pull identifiers, titles, descriptions, downloads, and rich metadata.",
        "version": "1.0",
        "x-build-id": "QL3MvOTy1HjCLJ32y"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~internet-archive-search-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-internet-archive-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~internet-archive-search-scraper/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-internet-archive-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~internet-archive-search-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-internet-archive-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000"
                    },
                    "searchQuery": {
                        "title": "Search Query (Lucene)",
                        "type": "string",
                        "description": "Lucene-style query. Examples: 'mars rover', 'subject:photography AND mediatype:image', 'creator:NASA'. See https://archive.org/advancedsearch.php."
                    },
                    "collection": {
                        "title": "Collection",
                        "type": "string",
                        "description": "Optional. Restrict to a single Internet Archive collection slug (e.g. 'nasa', 'opensource_movies', 'librivoxaudio')."
                    },
                    "mediaType": {
                        "title": "Media Type",
                        "enum": [
                            "texts",
                            "audio",
                            "movies",
                            "software",
                            "web",
                            "image",
                            "data",
                            "collection"
                        ],
                        "type": "string",
                        "description": "Optional. Restrict to one media type."
                    },
                    "creator": {
                        "title": "Creator",
                        "type": "string",
                        "description": "Optional. Filter by creator name (e.g. 'NASA', 'Library of Congress')."
                    },
                    "dateFrom": {
                        "title": "Date From (YYYY-MM-DD)",
                        "type": "string",
                        "description": "Optional. Earliest item date."
                    },
                    "dateTo": {
                        "title": "Date To (YYYY-MM-DD)",
                        "type": "string",
                        "description": "Optional. Latest item date."
                    },
                    "fetchDetails": {
                        "title": "Fetch Full Metadata",
                        "type": "boolean",
                        "description": "When enabled, fetches the full per-item metadata (slower, but includes file listings and rich subject tags). Recommended.",
                        "default": true
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
