Pricing

from $16.00 / 1,000 result items

Internet Archive Search Scraper

Search the Internet Archive's 50M+ item catalog of texts, audio, movies, software, web pages, and images. Filter by collection, media type, creator, and date. Pull identifiers, titles, descriptions, downloads, and rich metadata.

Pricing

from $16.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

Actor stats

Bookmarked

Total users

Monthly active users

8 hours ago

Last modified

📚 Internet Archive Search Scraper

🚀 Export the world's largest open library in seconds. Search 50M+ items across texts, audio, movies, software, web, images, and data. No login, no manual paging, no Lucene crash courses required.

The Internet Archive Search Scraper exports the open library catalog and returns 21 fields per record, including identifier, title, full description, creator, language, subject tags, collection memberships, publish date, lifetime and weekly download counts, file inventories, total byte size, license URL, and direct links to the item details page and metadata feed. The underlying source is the world's largest publicly accessible digital library, maintained since 1996.

The catalog covers 50 million+ items across eight media types (texts, audio, movies, software, web captures, images, datasets, and collections). This Actor lets you slice the corpus with Lucene-style queries plus structured filters for collection, media type, creator, and date range, then download the result as CSV, Excel, JSON, or XML in under five minutes.

🎯 Target Audience	💡 Primary Use Cases
Librarians, digital archivists, journalists, OSINT researchers, academic historians, ML dataset curators, documentary filmmakers	Citation discovery, training-corpus assembly, historical media research, source verification, public-domain media sourcing, archival preservation audits

📋 What the Archive Search Scraper does

A single configurable workflow with four filter layers:

🔎 Lucene query. Free-text or fielded queries like subject:photography AND mediatype:image.
📦 Collection filter. Restrict to one Internet Archive collection like nasa or librivoxaudio.
🎬 Media-type filter. Texts, audio, movies, software, web, image, data, or collection.
📅 Date range. Filter by item publish date with dateFrom and dateTo (YYYY-MM-DD).
📚 Per-item metadata. Optional deep fetch returns the full file list, rich subject tags, and license URL.

Each record bundles identifiers (Archive ID, details URL, metadata URL), descriptive metadata (title, creator, language, description, subject tags), classification (media type, collection memberships), engagement (lifetime, weekly, and monthly download counts), file inventory (count and total byte size), and licensing.

💡 Why it matters: the Archive is the largest public corpus of cultural and reference material on Earth, but its native search interface assumes you already know Lucene. This Actor exposes that same query layer with structured filters and clean records, ready for analysis, ingestion, or archival back-up.

📊 Data fields

Each record includes: collection, creator, date, description, detailsUrl, downloads, filesCount, identifier, language, licenseUrl, mediaType, metadataUrl, month, publishDate, scrapedAt, subject, thumbnailUrl, title, totalSizeBytes, week. These field names come straight from the actor's dataset schema, so what you see here is what lands in your dataset.

🚀 How to use

📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
🌐 Open the Actor. Go to the Internet Archive Search Scraper page on the Apify Store.
🎯 Set input. Type a search query, optionally restrict to a collection or media type, and set maxItems.
🚀 Run it. Click Start and let the Actor collect your dataset.
📥 Download. Grab results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.

🔗 Recommended Actors

📚 arXiv Scraper - Preprint papers across physics, math, and CS
📜 RFC Editor Index Scraper - IETF Internet standards catalog
🏛️ Met Museum Scraper - Metropolitan Museum of Art open-access objects
🔬 ClinicalTrials.gov Scraper - Registered medical trials with outcomes
🌍 REST Countries Info Scraper - 250+ countries with population, currencies, languages

💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.

⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the Internet Archive or any of its contributors. All trademarks mentioned are the property of their respective owners. Only publicly available open archival data is collected.

🆘 Need Help?

If you hit a bug, have questions about setup, or need a scraper we haven't built yet, open our contact form or write to parseforge@protonmail.com. We also take on paid custom data projects.

For faster answers, join our Discord. It's the best place to get support and suggest new actors.

Internet Archive Scraper

fortuitous_pirate/internet-archive-scraper

Search the Internet Archive's 35+ million items: books, movies, audio, software, and web pages. Filter by media type, subject, creator, language, or date range. Free API.

Fortuitous Pirate

Internet Archive Scraper

dami_studio/internet-archive-scraper

Searches the Internet Archive (archive.org) by keyword and returns structured items (title, creator, year, downloads, subjects, item URL); filter by media type and sort by downloads or upload date.

Dami's Studio

5.0

Internet Archive Items Scraper - Metadata Search

benthepythondev/internet-archive-items-scraper

Search Internet Archive items and export identifiers, titles, creators, dates, media types, downloads, descriptions, subjects and links.

Ben

Internet Archive Items Scraper - archive.org Search by Query

gio21/archive-org-items-scraper

Search Internet Archive (archive.org) items: books, movies, audio, software, images, web archives, data. Returns title, creator, date, description, downloads, identifier, URLs. Free, no key. For research, content discovery, digital preservation.

Gio

Archive.org Scraper

lulzasaur/archive-org-scraper

Scrape the Internet Archive (archive.org). Search 50M+ texts, 13M+ audio, 16M+ movies, and 1.3M+ software items. Get metadata, download counts, file lists, and more via public APIs.

lulz bot

Internet Archive Search — Wayback Machine Advanced Query Tool

maged120/archive-org-advanced-search

Search the Internet Archive (archive.org) with full advanced filter support — date range, media type, language, subject, and more. Returns metadata from archived web pages, books, audio, and video.

Maged

Internet Archive Search Scraper

crawlergang/internet-archive-search-scraper

Searches and retrieves items from the Internet Archive (archive.org) - 44M+ books, videos, audio, software, and web archives. Free, no API key required.

Crawler Gang

Internet Archive Search Scraper

crawlerbros/internet-archive-search-scraper

Searches and retrieves items from the Internet Archive (archive.org) - 44M+ books, videos, audio, software, and web archives. Free, no API key required.

Crawler Bros

Internet Archive & Wayback Machine Scraper

mangudai/internet-archive-scraper

Search the Internet Archive's 40M+ items, pull full item metadata and file lists, and query the Wayback Machine for URL snapshots. Books, audio, video, software, and archived pages on official archive.org APIs. No API key.

Mangudäi

Internet Archive Search Scraper

moving_beacon-owner1/internet-archive-search-scraper

Searches the Internet Archive using its advanced search API. Retrieves matching items with pagination, returning requested metadata fields and a link to each item's details page.