Pricing

$2.00 / 1,000 item returneds

Try for free

Go to Apify Store

Internet Archive Scraper

Try for free

Searches the Internet Archive (archive.org) by keyword and returns structured items (title, creator, year, downloads, subjects, item URL); filter by media type and sort by downloads or upload date.

Pricing

$2.00 / 1,000 item returneds

Rating

5.0

(1)

Developer

Dami's Studio

Actor stats

Bookmarked

Total users

Monthly active users

20 days ago

Last modified

What you get per item

identifier, title, creator, year, date, mediaType, downloads, subjects (array), description (first ~500 chars), publicdate, and url (https://archive.org/details/{identifier}).

Fields that can be null

title, creator, year, date, description, publicdate — null when archive.org's metadata doesn't include that field for an item.
subjects — empty array when the item has no subject tags.
downloads — 0 when not reported.

Input

Field	Notes
`query`	Required. Keywords, e.g. `nasa apollo`, `jazz`. Supports archive.org Lucene operators, e.g. `title:(grateful dead) AND year:[1977 TO 1980]`.
`mediaType`	Restrict to one type: `texts`, `audio`, `movies`, `software`, `image`, `web`, `data`, `collection`. Empty = any.
`sort`	`downloads` (default), `date`, `publicdate`, or `relevance`.
`maxItems`	Max unique items to return (default 100). Paginates 100 per request until reached or exhausted.

Output

One dataset row per item. Pricing is pay-per-result: you are only charged for genuine item rows (ok: true). Diagnostic rows are never charged — this includes:

empty/invalid input (errorCode: "BAD_INPUT" — empty query or an unknown mediaType),
no results for the query (NO_RESULTS),
rate limits or network errors (RATE_LIMITED / NETWORK / SERVER_ERROR).

Results are de-duplicated by identifier.

Proxy

The archive.org advancedsearch API is a public, no-auth JSON endpoint with no anti-bot, so no proxy is required and the default runs without one (saving proxy credits). Only enable Apify Proxy if you hit IP rate limits at very high volume.

Troubleshooting

Getting a BAD_INPUT row? Provide a non-empty query, and if you set mediaType make sure it's one of the allowed values.
NO_RESULTS? The query matched nothing on archive.org — broaden the keywords or remove the media-type filter.
Want fewer/more results? Adjust maxItems. The archive can return very large result sets for broad queries.

Example

{ "query": "jazz", "mediaType": "audio", "sort": "downloads", "maxItems": 50 }

Notes

The actor calls advancedsearch.php with output=json, requesting identifier, title, creator, year, date, mediatype, downloads, description, subject, and publicdate, then maps each doc to a clean row. Pagination uses page with 100 rows per request until your maxItems is reached or the numFound total is exhausted.

Internet Archive Items Scraper - archive.org Search by Query

gio21/archive-org-items-scraper

Search Internet Archive (archive.org) items: books, movies, audio, software, images, web archives, data. Returns title, creator, date, description, downloads, identifier, URLs. Free, no key. For research, content discovery, digital preservation.

Gio

Internet Archive Search Scraper

crawlerbros/internet-archive-search-scraper

Searches and retrieves items from the Internet Archive (archive.org) - 44M+ books, videos, audio, software, and web archives. Free, no API key required.

Crawler Bros

Internet Archive Search Scraper

parseforge/internet-archive-search-scraper

Search the Internet Archive's 50M+ item catalog of texts, audio, movies, software, web pages, and images. Filter by collection, media type, creator, and date. Pull identifiers, titles, descriptions, downloads, and rich metadata.

ParseForge

Internet Archive Scraper

fortuitous_pirate/internet-archive-scraper

Search the Internet Archive's 35+ million items: books, movies, audio, software, and web pages. Filter by media type, subject, creator, language, or date range. Free API.

Fortuitous Pirate

Internet Archive Search — Wayback Machine Advanced Query Tool

maged120/archive-org-advanced-search

Search the Internet Archive (archive.org) with full advanced filter support — date range, media type, language, subject, and more. Returns metadata from archived web pages, books, audio, and video.

Maged

Internet Archive Scraper

automation-lab/internet-archive-scraper

Search and extract metadata from the Internet Archive. Find books, videos, audio, software, and more from 40M+ items.

Stas Persiianenko

Internet Archive & Wayback Machine Scraper

mangudai/internet-archive-scraper

Search the Internet Archive's 40M+ items, pull full item metadata and file lists, and query the Wayback Machine for URL snapshots. Books, audio, video, software, and archived pages on official archive.org APIs. No API key.

Mangudäi

Internet Archive Book Reviews Scraper

thescrapelab/internet-archive-book-reviews-scraper

Extract public Archive.org book metadata, ISBNs, ratings, and user reviews from public Internet Archive endpoints. Start from URLs, identifiers, ISBNs, creators, collections, subjects, or search queries. Output is always one dataset row per public review. No API key required.

Inus Grobler

Archive.org Scraper

lulzasaur/archive-org-scraper

Scrape the Internet Archive (archive.org). Search 50M+ texts, 13M+ audio, 16M+ movies, and 1.3M+ software items. Get metadata, download counts, file lists, and more via public APIs.