Internet Archive Search Scraper
Pricing
from $3.00 / 1,000 results
Internet Archive Search Scraper
Searches and retrieves items from the Internet Archive (archive.org) - 44M+ books, videos, audio, software, and web archives. Free, no API key required.
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer
Crawler Bros
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Search and retrieve items from the Internet Archive (archive.org) — the world's largest digital library with 44M+ books, videos, audio recordings, software, and web archives. Free, no API key required.
What does this actor do?
This actor lets you:
- Search the entire Internet Archive by keyword with filters for media type, collection, language, date range, and sort order.
- Fetch specific items by their unique Archive.org identifiers, getting enriched metadata including file counts and item sizes.
Data Source
All data is retrieved from the Internet Archive public API:
- Advanced Search API:
https://archive.org/advancedsearch.php— free, no authentication required. - Metadata API:
https://archive.org/metadata/{identifier}— free, no authentication required.
Input
| Field | Type | Description |
|---|---|---|
mode | Select | search (default) or byIdentifiers |
query | String | Search keywords (e.g. "public domain books", "jazz music") |
mediaType | Select | Filter by type: texts, audio, movies, software, image, etree, data, web, collection, account |
collection | String | Filter by collection slug (e.g. "gutenbergbooks", "librivoxaudio", "prelinger") |
language | String | Filter by language code (e.g. "eng", "fra", "spa") |
dateFrom | String | Start date filter (YYYY or YYYY-MM-DD) |
dateTo | String | End date filter (YYYY or YYYY-MM-DD) |
sortBy | Select | Sort order: most downloaded, newest, oldest, or alphabetical |
identifiers | Array | Specific Archive.org identifiers (for byIdentifiers mode) |
maxItems | Integer | Max items to return (default: 50, max: 5000) |
Example Inputs
Search for classic literature texts:
{"mode": "search","query": "shakespeare","mediaType": "texts","language": "eng","maxItems": 25}
Fetch specific items by identifier:
{"mode": "byIdentifiers","identifiers": ["gutenberg-hamlet", "adventures_of_huckleberry_finn_librivox"],"maxItems": 10}
Search for audio recordings in a date range:
{"mode": "search","query": "blues music","mediaType": "audio","dateFrom": "1920","dateTo": "1960","sortBy": "-publicdate","maxItems": 100}
Output
Each item in the dataset contains:
| Field | Description |
|---|---|
identifier | Unique Archive.org identifier |
url | Direct URL to the item page (archive.org/details/{identifier}) |
title | Item title |
description | Item description |
creator | Author or creator |
date | Creation or publication date |
mediatype | Type of media (texts, audio, movies, etc.) |
collection | Collection it belongs to |
language | Language code(s) |
subject | Subject tags (up to 10) |
format | File format(s) (up to 5) |
downloads | Total download count |
files_count | Number of files in the item (byIdentifiers mode) |
item_size | Total size in bytes (byIdentifiers mode) |
server | Serving server hostname (byIdentifiers mode) |
scrapedAt | ISO 8601 timestamp of when data was scraped |
Example Output
{"identifier": "gutenberg-hamlet","url": "https://archive.org/details/gutenberg-hamlet","title": "Hamlet","description": "A classic tragedy by William Shakespeare","creator": "William Shakespeare","date": "1603","mediatype": "texts","collection": "gutenbergbooks","language": "eng","subject": ["drama", "tragedy", "Shakespeare"],"format": ["PDF", "EPUB", "Plain Text"],"downloads": 85432,"scrapedAt": "2026-01-15T10:30:00+00:00"}
Frequently Asked Questions
Is this free to use? Yes. The Internet Archive provides a completely free public API with no authentication required.
How many items can I retrieve?
Up to 5,000 items per run using the maxItems parameter.
What media types are available? Texts (books), Audio, Movies/Video, Software, Images, Live Music (etree), Data sets, Web Archives, and Collections.
Can I filter by collection?
Yes — use the collection field with a collection slug (e.g. "gutenbergbooks" for Project Gutenberg books, "librivoxaudio" for LibriVox audiobooks, "prelinger" for Prelinger Archives films).
Can I search in specific languages? Yes — use ISO 639-3 language codes like "eng" (English), "fra" (French), "spa" (Spanish), "deu" (German).
What are identifiers?
Every Internet Archive item has a unique identifier (e.g. "gutenberg-hamlet"). You can find these in Archive.org URLs: archive.org/details/{identifier}.
How is the data rate-limited? The actor adds a 0.3s delay between search pages and 0.5s between metadata requests to respect the API's guidelines.
Use Cases
- Building digital library catalogs
- Research on public domain content
- Finding historical audio/video recordings
- Locating old software for preservation research
- Downloading metadata for academic research
- Tracking download statistics for archive items