Internet Archive Search Scraper avatar

Internet Archive Search Scraper

Pricing

from $3.00 / 1,000 results

Go to Apify Store
Internet Archive Search Scraper

Internet Archive Search Scraper

Searches and retrieves items from the Internet Archive (archive.org) - 44M+ books, videos, audio, software, and web archives. Free, no API key required.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Search and retrieve items from the Internet Archive (archive.org) — the world's largest digital library with 44M+ books, videos, audio recordings, software, and web archives. Free, no API key required.

What does this actor do?

This actor lets you:

  • Search the entire Internet Archive by keyword with filters for media type, collection, language, date range, and sort order.
  • Fetch specific items by their unique Archive.org identifiers, getting enriched metadata including file counts and item sizes.

Data Source

All data is retrieved from the Internet Archive public API:

  • Advanced Search API: https://archive.org/advancedsearch.php — free, no authentication required.
  • Metadata API: https://archive.org/metadata/{identifier} — free, no authentication required.

Input

FieldTypeDescription
modeSelectsearch (default) or byIdentifiers
queryStringSearch keywords (e.g. "public domain books", "jazz music")
mediaTypeSelectFilter by type: texts, audio, movies, software, image, etree, data, web, collection, account
collectionStringFilter by collection slug (e.g. "gutenbergbooks", "librivoxaudio", "prelinger")
languageStringFilter by language code (e.g. "eng", "fra", "spa")
dateFromStringStart date filter (YYYY or YYYY-MM-DD)
dateToStringEnd date filter (YYYY or YYYY-MM-DD)
sortBySelectSort order: most downloaded, newest, oldest, or alphabetical
identifiersArraySpecific Archive.org identifiers (for byIdentifiers mode)
maxItemsIntegerMax items to return (default: 50, max: 5000)

Example Inputs

Search for classic literature texts:

{
"mode": "search",
"query": "shakespeare",
"mediaType": "texts",
"language": "eng",
"maxItems": 25
}

Fetch specific items by identifier:

{
"mode": "byIdentifiers",
"identifiers": ["gutenberg-hamlet", "adventures_of_huckleberry_finn_librivox"],
"maxItems": 10
}

Search for audio recordings in a date range:

{
"mode": "search",
"query": "blues music",
"mediaType": "audio",
"dateFrom": "1920",
"dateTo": "1960",
"sortBy": "-publicdate",
"maxItems": 100
}

Output

Each item in the dataset contains:

FieldDescription
identifierUnique Archive.org identifier
urlDirect URL to the item page (archive.org/details/{identifier})
titleItem title
descriptionItem description
creatorAuthor or creator
dateCreation or publication date
mediatypeType of media (texts, audio, movies, etc.)
collectionCollection it belongs to
languageLanguage code(s)
subjectSubject tags (up to 10)
formatFile format(s) (up to 5)
downloadsTotal download count
files_countNumber of files in the item (byIdentifiers mode)
item_sizeTotal size in bytes (byIdentifiers mode)
serverServing server hostname (byIdentifiers mode)
scrapedAtISO 8601 timestamp of when data was scraped

Example Output

{
"identifier": "gutenberg-hamlet",
"url": "https://archive.org/details/gutenberg-hamlet",
"title": "Hamlet",
"description": "A classic tragedy by William Shakespeare",
"creator": "William Shakespeare",
"date": "1603",
"mediatype": "texts",
"collection": "gutenbergbooks",
"language": "eng",
"subject": ["drama", "tragedy", "Shakespeare"],
"format": ["PDF", "EPUB", "Plain Text"],
"downloads": 85432,
"scrapedAt": "2026-01-15T10:30:00+00:00"
}

Frequently Asked Questions

Is this free to use? Yes. The Internet Archive provides a completely free public API with no authentication required.

How many items can I retrieve? Up to 5,000 items per run using the maxItems parameter.

What media types are available? Texts (books), Audio, Movies/Video, Software, Images, Live Music (etree), Data sets, Web Archives, and Collections.

Can I filter by collection? Yes — use the collection field with a collection slug (e.g. "gutenbergbooks" for Project Gutenberg books, "librivoxaudio" for LibriVox audiobooks, "prelinger" for Prelinger Archives films).

Can I search in specific languages? Yes — use ISO 639-3 language codes like "eng" (English), "fra" (French), "spa" (Spanish), "deu" (German).

What are identifiers? Every Internet Archive item has a unique identifier (e.g. "gutenberg-hamlet"). You can find these in Archive.org URLs: archive.org/details/{identifier}.

How is the data rate-limited? The actor adds a 0.3s delay between search pages and 0.5s between metadata requests to respect the API's guidelines.

Use Cases

  • Building digital library catalogs
  • Research on public domain content
  • Finding historical audio/video recordings
  • Locating old software for preservation research
  • Downloading metadata for academic research
  • Tracking download statistics for archive items