Pricing

from $1.00 / 1,000 results

Internet Archive Book Reviews Scraper

Extract public Archive.org book metadata, ISBNs, ratings, and user reviews from public Internet Archive endpoints. Start from URLs, identifiers, ISBNs, creators, collections, subjects, or search queries. Output is always one dataset row per public review. No API key required.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Inus Grobler

Actor stats

Bookmarked

Total users

Monthly active users

8 days ago

Last modified

What You Can Use It For

Build datasets of public Archive.org book reviews and star ratings
Enrich book records with title, creator, ISBN, publisher, subject, language, collection, and cover URL
Monitor public reviews for selected Internet Archive books
Research library, public-domain, and scanned-book collections
Export review data to spreadsheets, dashboards, databases, or AI workflows
Find reviewed books by ISBN, creator, collection, subject, or Archive.org search query

What It Extracts

Each dataset row is one public review enriched with source item metadata.

Field group	Examples
Item identity	`identifier`, `itemUrl`, `metadataUrl`, `coverUrl`
Book metadata	`title`, `creators`, `publisher`, `publishedDate`, `language`, `subjects`, `collections`, `mediatype`
Book identifiers	`isbn10`, `isbn13`
Review data	`reviewTitle`, `reviewText`, `stars`, `rating`, `ratingScale`, `reviewerName`, `createdAt`, `reviewUpdatedAt`
Run metadata	`reviewSource`, `reviewHash`, `scrapedAt`

Simple Input

For the easiest run, add known Archive.org item URLs, item identifiers, or ISBNs to sources.

{
  "sources": ["https://archive.org/details/goodytwoshoes00newyiala"],
  "maxItems": 1,
  "maxReviewsPerItem": 10
}

You can also paste multiple values into sources, one per line:

https://archive.org/details/goodytwoshoes00newyiala
goodytwoshoes00newyiala
9780140449136

Discovery Inputs

Use these when you want the Actor to find matching Archive.org text items:

isbns: ISBN-10 or ISBN-13 values
creators: author or creator names
collections: Archive.org collection identifiers, such as internetarchivebooks
subjects: subject terms, such as fiction
searchQueries: raw Archive.org search queries

For review-focused discovery, use:

{
  "searchQueries": ["mediatype:texts AND reviewdate:*"],
  "maxItems": 25,
  "maxReviewsPerItem": 20
}

General searches can match books that have no public review rows. Keep onlyItemsWithReviews enabled if you only want review output.

Advanced Options

Most users do not need advanced settings. They are available for large or filtered runs:

minStars and maxStars: keep reviews within a rating range
reviewTextContains: keep reviews whose title or body contains text
languageFilter: keep items in selected languages
mediatypes: defaults to texts
globalConcurrency, perHostConcurrency, requestDelayMs, requestTimeoutSecs, maxRetries: HTTP reliability controls
includeRawMetadata and includeRawReviews: debugging fields for advanced users

Backward-compatible fields such as startUrls, identifiers, includeMetadata, includeReviews, includeFiles, and outputMode are still accepted. The dataset output remains one row per public review.

Example Output

{
  "entityType": "review",
  "source": "internet_archive",
  "identifier": "goodytwoshoes00newyiala",
  "itemUrl": "https://archive.org/details/goodytwoshoes00newyiala",
  "metadataUrl": "https://archive.org/metadata/goodytwoshoes00newyiala",
  "title": "Goody Two-Shoes",
  "creators": [],
  "isbn10": null,
  "isbn13": null,
  "publisher": "New-York : McLoughlin Bro's",
  "publishedDate": "c1888",
  "language": ["eng"],
  "subjects": ["Brothers and sisters", "Orphans", "Conduct of life", "Education"],
  "collections": ["cdl", "yrlsc", "iacl", "americana"],
  "mediatype": "texts",
  "coverUrl": "https://archive.org/services/img/goodytwoshoes00newyiala",
  "reviewTitle": "Fun",
  "reviewText": "This is an enjoyable read.",
  "stars": 4,
  "rating": 4,
  "ratingScale": 5,
  "reviewerName": "ErniePye",
  "reviewerItemName": null,
  "createdAt": "2007-09-06 03:26:39",
  "reviewUpdatedAt": "2007-09-06 03:26:39",
  "reviewSource": "metadata_reviews_branch",
  "reviewHash": "ce6e1ad425a071ae6e6cccc8ec08d8e73c216f5ee1d3de825be0b5e82c550dfd",
  "scrapedAt": "2026-06-14T18:01:23.349Z"
}

How To Run On Apify

Open the Actor on Apify.
Add one or more values to sources, or use discovery fields such as creators, collections, or searchQueries.
Set maxItems and maxReviewsPerItem to control run size.
Click Start.
Download results from the Dataset tab as JSON, CSV, Excel, XML, or HTML.

Python API Example

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run_input = {
    "sources": ["https://archive.org/details/goodytwoshoes00newyiala"],
    "maxItems": 1,
    "maxReviewsPerItem": 10,
}

run = client.actor("TheScrapeLab/internet-archive-book-reviews-scraper").call(run_input=run_input)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["identifier"], item["stars"], item["reviewText"])

Limits And Caveats

The Actor extracts public Archive.org metadata and public review objects only.
Items without public reviews do not produce dataset rows when onlyItemsWithReviews is enabled.
Some public review objects may omit stars, reviewer names, or timestamps.
Archive.org endpoint latency and availability can vary.
The Actor is read-only. It does not log in, modify Archive.org data, solve captchas, or bypass access controls.
Optional Internet Archive credentials are accepted for compatibility, but they are not required or used for public review scraping.

Troubleshooting

No rows in the dataset: The item may have no public reviews, or your filters may have removed all reviews.

Search finds books but no reviews: Use mediatype:texts AND reviewdate:* in searchQueries to focus on reviewed text items.

Invalid URL error: Use an Archive.org /details/{identifier}, /metadata/{identifier}, or /metadata/{identifier}/reviews URL.

Missing identifier: The item may be unavailable through the public metadata endpoint. The Actor records a warning and continues when possible.

Slow broad searches: Reduce maxItems, keep maxReviewsPerItem close to what you need, and use specific creators, collections, subjects, or queries.

Pricing

Recommended pricing model: pay per result.

Each useful dataset row is one public review enriched with item metadata. A simple per-result price is easiest for users to understand and keeps small tests affordable. Based on measured 256 MB runs, the recommended starting price is $0.001 per dataset item, with platform usage paid by the user. Large-volume users can be offered lower private pricing after enough production usage is measured.

FAQ

Can I scrape Internet Archive book reviews by URL?

Yes. Add Archive.org item URLs to sources, such as https://archive.org/details/goodytwoshoes00newyiala.

Can I search Internet Archive reviews by author?

Yes. Add author names to creators, or use a raw Archive.org query in searchQueries.

Does this download books or files?

No. It extracts public metadata and public reviews. It does not download book files.

Does it need an Internet Archive account?

No. Public item metadata and reviews do not require an Internet Archive account.

Why do some books have no output?

The dataset is review-focused. If a book has no public reviews, it normally produces no dataset rows.

Can I export to CSV or Excel?

Yes. Use the Dataset tab in Apify Console and choose CSV, Excel, JSON, XML, or HTML.

Internet Archive Search Scraper

crawlerbros/internet-archive-search-scraper

Searches and retrieves items from the Internet Archive (archive.org) - 44M+ books, videos, audio, software, and web archives. Free, no API key required.

Crawler Bros

Internet Archive Scraper

dami_studio/internet-archive-scraper

Searches the Internet Archive (archive.org) by keyword and returns structured items (title, creator, year, downloads, subjects, item URL); filter by media type and sort by downloads or upload date.

Dami's Studio

5.0

Internet Archive Items Scraper - archive.org Search by Query

gio21/archive-org-items-scraper

Search Internet Archive (archive.org) items: books, movies, audio, software, images, web archives, data. Returns title, creator, date, description, downloads, identifier, URLs. Free, no key. For research, content discovery, digital preservation.

Gio

Internet Archive Scraper

automation-lab/internet-archive-scraper

Search and extract metadata from the Internet Archive. Find books, videos, audio, software, and more from 40M+ items.

Stas Persiianenko

Internet Archive Search — Wayback Machine Advanced Query Tool

maged120/archive-org-advanced-search

Search the Internet Archive (archive.org) with full advanced filter support — date range, media type, language, subject, and more. Returns metadata from archived web pages, books, audio, and video.

Maged

Open Library Scraper — Book Metadata in Bulk

devilscrapes/openlibrary-books-scraper

Search the Open Library API (the Internet Archive's open book catalogue) and export structured book metadata — title, authors, ISBNs, subjects, publish year, cover URL, edition count, OpenLibrary ID — to JSON or CSV. We handle pagination and retries across 30M+ works.

DevilScrapes

Archive.org Scraper

lulzasaur/archive-org-scraper

Scrape the Internet Archive (archive.org). Search 50M+ texts, 13M+ audio, 16M+ movies, and 1.3M+ software items. Get metadata, download counts, file lists, and more via public APIs.

lulz bot

Internet Archive Search Scraper

parseforge/internet-archive-search-scraper

Search the Internet Archive's 50M+ item catalog of texts, audio, movies, software, web pages, and images. Filter by collection, media type, creator, and date. Pull identifiers, titles, descriptions, downloads, and rich metadata.

ParseForge

Internet Archive & Wayback Machine Scraper

mangudai/internet-archive-scraper

Search the Internet Archive's 40M+ items, pull full item metadata and file lists, and query the Wayback Machine for URL snapshots. Books, audio, video, software, and archived pages on official archive.org APIs. No API key.