Pricing

from $3.00 / 1,000 results

Internet Archive Search Scraper

Searches and retrieves items from the Internet Archive (archive.org) - 44M+ books, videos, audio, software, and web archives. Free, no API key required.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Crawler Bros

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

What does this actor do?

This actor lets you:

Search the entire Internet Archive by keyword with filters for media type, collection, language, date range, and sort order.
Fetch specific items by their unique Archive.org identifiers, getting enriched metadata including file counts and item sizes.

Data Source

All data is retrieved from the Internet Archive public API:

Advanced Search API: https://archive.org/advancedsearch.php — free, no authentication required.
Metadata API: https://archive.org/metadata/{identifier} — free, no authentication required.

Input

Field	Type	Description
`mode`	Select	`search` (default) or `byIdentifiers`
`query`	String	Search keywords (e.g. "public domain books", "jazz music")
`mediaType`	Select	Filter by type: texts, audio, movies, software, image, etree, data, web, collection, account
`collection`	String	Filter by collection slug (e.g. "gutenbergbooks", "librivoxaudio", "prelinger")
`language`	String	Filter by language code (e.g. "eng", "fra", "spa")
`dateFrom`	String	Start date filter (YYYY or YYYY-MM-DD)
`dateTo`	String	End date filter (YYYY or YYYY-MM-DD)
`sortBy`	Select	Sort order: most downloaded, newest, oldest, or alphabetical
`identifiers`	Array	Specific Archive.org identifiers (for byIdentifiers mode)
`maxItems`	Integer	Max items to return (default: 50, max: 5000)

Example Inputs

Search for classic literature texts:

{
  "mode": "search",
  "query": "shakespeare",
  "mediaType": "texts",
  "language": "eng",
  "maxItems": 25
}

Fetch specific items by identifier:

{
  "mode": "byIdentifiers",
  "identifiers": ["gutenberg-hamlet", "adventures_of_huckleberry_finn_librivox"],
  "maxItems": 10
}

Search for audio recordings in a date range:

{
  "mode": "search",
  "query": "blues music",
  "mediaType": "audio",
  "dateFrom": "1920",
  "dateTo": "1960",
  "sortBy": "-publicdate",
  "maxItems": 100
}

Output

Each item in the dataset contains:

Field	Description
`identifier`	Unique Archive.org identifier
`url`	Direct URL to the item page (archive.org/details/{identifier})
`title`	Item title
`description`	Item description
`creator`	Author or creator
`date`	Creation or publication date
`mediatype`	Type of media (texts, audio, movies, etc.)
`collection`	Collection it belongs to
`language`	Language code(s)
`subject`	Subject tags (up to 10)
`format`	File format(s) (up to 5)
`downloads`	Total download count
`files_count`	Number of files in the item (byIdentifiers mode)
`item_size`	Total size in bytes (byIdentifiers mode)
`server`	Serving server hostname (byIdentifiers mode)
`scrapedAt`	ISO 8601 timestamp of when data was scraped

Example Output

{
  "identifier": "gutenberg-hamlet",
  "url": "https://archive.org/details/gutenberg-hamlet",
  "title": "Hamlet",
  "description": "A classic tragedy by William Shakespeare",
  "creator": "William Shakespeare",
  "date": "1603",
  "mediatype": "texts",
  "collection": "gutenbergbooks",
  "language": "eng",
  "subject": ["drama", "tragedy", "Shakespeare"],
  "format": ["PDF", "EPUB", "Plain Text"],
  "downloads": 85432,
  "scrapedAt": "2026-01-15T10:30:00+00:00"
}

Frequently Asked Questions

Is this free to use? Yes. The Internet Archive provides a completely free public API with no authentication required.

How many items can I retrieve? Up to 5,000 items per run using the maxItems parameter.

What media types are available? Texts (books), Audio, Movies/Video, Software, Images, Live Music (etree), Data sets, Web Archives, and Collections.

Can I filter by collection? Yes — use the collection field with a collection slug (e.g. "gutenbergbooks" for Project Gutenberg books, "librivoxaudio" for LibriVox audiobooks, "prelinger" for Prelinger Archives films).

Can I search in specific languages? Yes — use ISO 639-3 language codes like "eng" (English), "fra" (French), "spa" (Spanish), "deu" (German).

What are identifiers? Every Internet Archive item has a unique identifier (e.g. "gutenberg-hamlet"). You can find these in Archive.org URLs: archive.org/details/{identifier}.

How is the data rate-limited? The actor adds a 0.3s delay between search pages and 0.5s between metadata requests to respect the API's guidelines.

Use Cases

Building digital library catalogs
Research on public domain content
Finding historical audio/video recordings
Locating old software for preservation research
Downloading metadata for academic research
Tracking download statistics for archive items

Internet Archive Search Scraper

crawlergang/internet-archive-search-scraper

Searches and retrieves items from the Internet Archive (archive.org) - 44M+ books, videos, audio, software, and web archives. Free, no API key required.

Crawler Gang

5.0

Internet Archive Items Scraper - archive.org Search by Query

gio21/archive-org-items-scraper

Search Internet Archive (archive.org) items: books, movies, audio, software, images, web archives, data. Returns title, creator, date, description, downloads, identifier, URLs. Free, no key. For research, content discovery, digital preservation.

Gio

Internet Archive Scraper

truenorth/internet-archive-scraper

Search archive.org and export books, audio, video, software, images, metadata, and file lists as structured JSON or CSV.

TrueNorth

Internet Archive Scraper

fortuitous_pirate/internet-archive-scraper

Search the Internet Archive's 35+ million items: books, movies, audio, software, and web pages. Filter by media type, subject, creator, language, or date range. Free API.

Fortuitous Pirate

Internet Archive Search — Wayback Machine Advanced Query Tool

maged120/archive-org-advanced-search

Search the Internet Archive (archive.org) with full advanced filter support — date range, media type, language, subject, and more. Returns metadata from archived web pages, books, audio, and video.

Maged

Internet Archive Scraper

dami_studio/internet-archive-scraper

Searches the Internet Archive (archive.org) by keyword and returns structured items (title, creator, year, downloads, subjects, item URL); filter by media type and sort by downloads or upload date.

Dami's Studio

5.0

Internet Archive & Wayback Machine Scraper

mangudai/internet-archive-scraper

Search the Internet Archive's 40M+ items, pull full item metadata and file lists, and query the Wayback Machine for URL snapshots. Books, audio, video, software, and archived pages on official archive.org APIs. No API key.

Mangudäi

Internet Archive Metadata Scraper — Bulk archive.org Export

logiover/internet-archive-metadata-scraper

Bulk-export item metadata from the Internet Archive (archive.org) by full-text query, collection, media type, creator, subject and date range. Extract identifier, title, creator, date, downloads, format, subject and more. Millions of items. No API key, no login.

Logiover

Internet Archive Search Scraper

moving_beacon-owner1/internet-archive-search-scraper

Searches the Internet Archive using its advanced search API. Retrieves matching items with pagination, returning requested metadata fields and a link to each item's details page.