Internet Archive Scraper avatar

Internet Archive Scraper

Pricing

Pay per event

Go to Apify Store
Internet Archive Scraper

Internet Archive Scraper

Search and extract metadata from the Internet Archive. Find books, videos, audio, software, and more from 40M+ items.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Categories

Share

Search and extract metadata from the Internet Archive — the world's largest digital library with 40M+ items. Find books, videos, audio, software, images, and web archives.

What does Internet Archive Scraper do?

Internet Archive Scraper searches the Internet Archive's vast collection and extracts structured metadata for each item. Get titles, creators, descriptions, download counts, file formats, subjects, and direct links. Supports filtering by media type (books, movies, audio, software, etc.) and sorting by popularity, date, or title.

The Internet Archive hosts over 40 million items including 28M+ books, 14M+ audio recordings, 7M+ videos, and millions of software titles, images, and web pages.

Why use Internet Archive Scraper?

  • 40M+ items — access the largest free digital library in the world
  • All media types — books, movies, audio, software, images, data, web archives
  • Download stats — see how popular each item is with download counts
  • Multiple formats — items often have PDF, EPUB, MOBI, MP3, MP4, and more
  • Pagination — extract up to 500 items per search query
  • Sorting — sort by relevance, most downloaded, newest, oldest, or title

Use cases

  • Research — find public domain books, papers, and historical documents
  • Media analysis — track download trends for audio, video, and software
  • Content curation — discover popular public domain content for projects
  • Digital preservation — catalog archived websites and historical software
  • Education — find open educational resources and textbooks
  • Historical research — access vintage software, old magazines, and rare recordings

How to use Internet Archive Scraper

  1. Go to the Internet Archive Scraper input page.
  2. Add search terms to the Search queries list.
  3. Optionally filter by Media type and choose a Sort by order.
  4. Click Start and wait for the run to finish.
  5. Download your data in JSON, CSV, or Excel format.

Input parameters

ParameterTypeRequiredDefaultDescription
searchQueriesarrayYesSearch terms to find items
mediaTypestringNoallFilter: texts, movies, audio, software, image, data, web, collection, etree
sortBystringNorelevanceSort: downloads desc, date desc, date asc, titleSorter asc/desc
maxResultsintegerNo50Max results per query (1–500)

Example input

{
"searchQueries": ["machine learning", "public domain films"],
"sortBy": "downloads desc",
"maxResults": 50
}

Output example

Each item returns structured metadata:

{
"identifier": "deep-learning-collection-pdf",
"title": "Deep Learning Collection PDF",
"creator": "",
"description": "A collection of deep learning resources...",
"mediaType": "texts",
"collection": "opensource",
"date": "2019-01-15",
"year": "2019",
"language": "English",
"subject": ["deep learning", "machine learning", "neural networks"],
"downloads": 89271,
"itemSize": 524288000,
"filesCount": 12,
"format": ["Archive BitTorrent", "PDF", "Text"],
"licenseUrl": "",
"detailsUrl": "https://archive.org/details/deep-learning-collection-pdf",
"downloadUrl": "https://archive.org/download/deep-learning-collection-pdf",
"thumbnailUrl": "https://archive.org/services/img/deep-learning-collection-pdf",
"searchQuery": "machine learning",
"scrapedAt": "2026-03-03T05:42:00.000Z"
}

Output fields

FieldTypeDescription
identifierstringUnique Archive.org item identifier
titlestringItem title
creatorstringAuthor, artist, or uploader
descriptionstringItem description
mediaTypestringMedia type (texts, movies, audio, software, etc.)
collectionstringCollection(s) the item belongs to
datestringPublication or upload date
yearstringYear of publication
languagestringContent language
subjectarraySubject tags and categories
downloadsnumberTotal download count
itemSizenumberTotal size in bytes
filesCountnumberNumber of files in the item
formatarrayAvailable file formats
licenseUrlstringLicense URL if specified
detailsUrlstringLink to item details page
downloadUrlstringDirect download link
thumbnailUrlstringThumbnail image URL
searchQuerystringThe search query that found this item
scrapedAtstringISO 8601 timestamp of extraction

Pricing

Internet Archive Scraper uses pay-per-event pricing:

EventPrice
Run started$0.001
Item extracted$0.001 per item

Cost examples

ItemsCost
50 items (1 search)$0.051
200 items (2 searches)$0.201
500 items (5 searches)$0.501

API usage

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("YOUR_USERNAME/internet-archive-scraper").call(
run_input={
"searchQueries": ["artificial intelligence"],
"mediaType": "texts",
"sortBy": "downloads desc",
"maxResults": 25
}
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{item['title']}{item['downloads']:,} downloads — {item['detailsUrl']}")

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('YOUR_USERNAME/internet-archive-scraper').call({
searchQueries: ['artificial intelligence'],
mediaType: 'texts',
sortBy: 'downloads desc',
maxResults: 25,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(item => {
console.log(`${item.title}${item.downloads.toLocaleString()} downloads`);
});

Integrations

Connect Internet Archive Scraper to your workflow with Apify integrations:

  • Webhooks — trigger actions when extraction completes
  • Google Sheets — export archive data to spreadsheets
  • Slack — get notified about new uploads matching your criteria
  • Zapier / Make — connect to 5,000+ apps and services
  • REST API — call the actor programmatically from any language

Tips and best practices

  • Use specific search terms for focused results — broad queries return millions of items
  • Filter by mediaType to narrow results (e.g., "texts" for books, "software" for games)
  • Sort by "downloads desc" to find the most popular items
  • The downloadUrl provides direct access to download all files for an item
  • thumbnailUrl works for most items and is useful for building visual catalogs
  • Use subject tags for categorization — they're user-contributed and can be very detailed

Changelog

  • v0.1 — Initial release with full-text search, media type filtering, and sorting