OpenLibrary Books — Metadata, ISBNs, Authors, CSV, No API Key avatar

OpenLibrary Books — Metadata, ISBNs, Authors, CSV, No API Key

Pricing

Pay per usage

Go to Apify Store
OpenLibrary Books — Metadata, ISBNs, Authors, CSV, No API Key

OpenLibrary Books — Metadata, ISBNs, Authors, CSV, No API Key

19 runs. OpenLibrary metadata as CSV/JSON — titles, authors, ISBNs, subjects, languages, pageCount, coverUrl, ebookAccess, ratings. By query/ISBN/subject/author. For library cataloguing + book-rec engines + academic research. No API key. Backed by 951-run Trustpilot flagship + 31-actor portfolio.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Alex

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

6 days ago

Last modified

Share

OpenLibrary Book Scraper — Metadata, ISBNs, Authors, Subjects

Scrape book metadata from the free OpenLibrary API. No API key, no rate-limit token, no auth wall. Four input modes: search queries, ISBN lookups, subject browse, author works. Output JSON or CSV.

Built for: library data builds, reading-list automation, ISBN enrichment, book recommendation datasets, academic citation enrichment.


What this actor does (honest scope, verified against src/main.js)

Calls these public OpenLibrary endpoints under the hood:

Input fieldEndpoint hitReturns
searchQueries/search.json?q=…&page=N&limit=5050 docs/page, paginated until maxBooksPerSource reached or numFound <= collected
isbns/isbn/{isbn}.jsonOne book per ISBN
subjects/subjects/{slug}.json?limit=min(maxBooksPerSource, 50)Hard-capped at 50 — even if you set maxBooksPerSource=200, subject browse returns at most 50
authors/search/authors.json?limit=1 + /authors/{key}/works.json?limit=min(maxBooksPerSource, 50)First author match only (no disambiguation), then up to 50 works

Sets User-Agent: ApifyOpenLibraryScraper/1.0. Inserts polite delays between requests: 200ms after each work-description fetch, 300ms before each ISBN/subject/author lookup, 500ms between search-mode pages. If includeDescription=true (default), search-mode and isbn-mode fire one extra /works/{key}.json per book to pull the description text — slower but richer. Subject-mode and author-mode never fetch the work-description endpoint — they read whatever description is already in the listing payload.


Input parameters

FieldTypeDefaultDescription
searchQueriesarray of strings[]Free-text search (title/keyword/phrase)
isbnsarray of strings[]ISBN-10 or ISBN-13 lookups
subjectsarray of strings[]Subject names — auto-lowercased and spaces replaced with underscores (e.g. "Science Fiction" → slug science_fiction). Special characters NOT escaped beyond URL-encoding — exotic subject names may 404.
authorsarray of strings[]Author names (e.g. "Isaac Asimov"). Only the first match is taken (limit=1) — common-name authors may resolve to a different person than expected. Use the OpenLibrary author-key directly via a custom build if disambiguation matters.
maxBooksPerSourceinteger50Cap per query/ISBN/subject/author (schema allows 1-200, but subjects and authors are server-capped at 50 regardless)
includeDescriptionbooleantrueFetch full description (extra API call per book in search-mode and isbn-mode only)

You can mix all four modes in a single run. Each output record carries a source field telling you which mode produced it (search:<query>, isbn:<n>, subject:<s>, author:<a>).


Output schema (varies by source mode — fields differ deliberately)

Records from different modes carry different field sets. This is by design — OpenLibrary returns richer metadata for search results than for ISBN / subject / author endpoints.

search: mode (22 base fields, +description with includeDescription, +2 metadata = up to 25)

{
"title": "Foundation",
"authors": ["Isaac Asimov"],
"authorKeys": ["OL26320A"],
"firstPublishYear": 1951,
"publishYears": [1951, 1952, 1955, 1962, 1974],
"isbn": "9780553293357",
"allIsbns": ["9780553293357", "9780553382570", "..."],
"subjects": ["Science fiction", "Galactic empire", "..."],
"publishers": ["Bantam Spectra", "Doubleday", "..."],
"languages": ["eng"],
"pageCount": 244,
"editionCount": 142,
"coverUrl": "https://covers.openlibrary.org/b/id/9261361-L.jpg",
"openLibraryKey": "/works/OL46828W",
"openLibraryUrl": "https://openlibrary.org/works/OL46828W",
"ebookAccess": "borrowable",
"hasFulltext": true,
"ratingsAverage": 4.12,
"ratingsCount": 1284,
"wantToRead": 8421,
"currentlyReading": 412,
"alreadyRead": 6203,
"description": "In the waning days of a future Galactic Empire...",
"source": "search:foundation",
"scrapedAt": "2026-04-29T12:00:00.000Z"
}

Field caps in search-mode: allIsbns truncated to first 10, subjects truncated to first 20, publishers truncated to first 5. pageCount is number_of_pages_median (median across editions, not the specific-edition page count).

isbn: mode (10 base fields, +description+subjects if includeDescription=true)

title, isbn, publishers (uncapped), publishDate, pageCount (specific-edition number_of_pages, NOT median), coverUrl, openLibraryKey, openLibraryUrl, source, scrapedAt. With includeDescription=true, adds description and subjects (uncapped). Description fetch is wrapped in a silent try/catch — on failure, both description and subjects are simply absent (no error field, no retry).

subject: mode (10 fields)

title, authors (array of names — different shape than search-mode's authorKeys), coverUrl, openLibraryKey, openLibraryUrl, editionCount, firstPublishYear, subject, source, scrapedAt. No ratings, no ISBN, no description in this mode — that's an OpenLibrary /subjects/ endpoint limitation, not ours. Server hard-caps to 50 records regardless of maxBooksPerSource.

author: mode (7 base fields, +description if includeDescription=true)

title, authors (1-element array with the resolved author name), authorKey (singular — different from search-mode's plural authorKeys), openLibraryKey, openLibraryUrl, covers (capped to first 3 cover URLs), source, scrapedAt. With includeDescription=true, adds description IF the author-works payload already contains it (no extra API call — purely best-effort). Server hard-caps to 50 works regardless of maxBooksPerSource.

Field-name asymmetry across modes: search-mode emits authorKeys (plural array) + coverUrl (single URL); author-mode emits authorKey (singular string) + covers (array of up to 3); subject-mode and isbn-mode emit neither. If you join across modes, normalize these explicitly.


Operational caveats

  • ⚠️ Outer try/catch wraps the entire 4-mode for-loop (src/main.js lines 57-222). ISBN, subject, and author loops have inner try/catch so individual lookup failures don't halt their batch. BUT search-mode does NOT have inner protection — a single search-API failure (e.g. transient HTTP 500, network blip) kills the run mid-stream and skips ALL remaining search queries, ISBN lookups, subject browses, and author lookups. Run problematic queries in isolation if dropout matters.
  • No retry / no proxy. Single fetch() per URL. Heavy bursts may eventually trigger OpenLibrary's polite-use ceiling (~100 req/min unofficial); the actor will surface that as a thrown HTTP error.
  • Description-fetch silent-empty. When includeDescription=true and the work-page fetch fails, description is set to empty string (search-mode) or absent (isbn-mode) — no error is logged per book.
  • Subject slug transform is naive. Input "Science Fiction" → slug "science_fiction". Special characters beyond letters/spaces are URL-encoded but not slug-normalized; subjects like "René Magritte's books" will likely 404.

What this actor does NOT do

  • No reading-progress / personal-list scraping — OpenLibrary doesn't expose individual users' lists.
  • No full-text book content — only metadata + descriptions. Read free books at openlibrary.org or via Internet Archive.
  • No price comparison — OpenLibrary is metadata-only, not a bookstore.
  • No deduplication across modes — if you search "Foundation" and lookup ISBN 9780553293357, you'll get 2 records. Dedupe by openLibraryKey post-run if needed.
  • No incremental crawl / cursor state — each run starts fresh from page 1.
  • No author disambiguation — first match wins.

When this stops being enough

If you need book full-text → use Internet Archive. If you need real-time bookstore prices → write a separate Amazon/Bookshop scraper. If you need annotated bibliographies → look at Goodreads (no public API since 2020, harder).


Custom builds — pilot tiers

This actor runs on Apify's standard compute. If you need a custom variant — search-mode-only with retry+backoff, ISBN-bulk with deduplication, subject browse paginated past the 50-cap (via search workaround), author-key direct lookup, hourly cron, Slack alerts on new releases — three tiers:

  • Pilot — $97 · 1 actor, basic config, 7-day support. Good for one-off "top 200 books in subject X" via search + subject hybrid.
  • Standard — $297 · custom actor + Slack/email alerts on results, 30-day support. Most reading-list / catalog-enrichment projects fit here.
  • Premium — $797 · custom actor + dashboard + 90-day support + 1 modification round. For ongoing pipelines (weekly new-release feed, ISBN-stream enrichment, author-tracking dashboards).

Email: spinov001@gmail.com — drop the input shape and the schema you need; quote within 48h.

Proof of work: 31 published Apify scrapers (78 total in portfolio) — Trustpilot 949 runs, Reddit 80+, Google News 43, Glassdoor 37, Email Extractor 36+. Recently delivered a paid 3-article series for a client in the proxy industry ($150).

More tips: t.me/scraping_ai · blog.spinov.online


SourceActorData
OpenLibrary (this)Book metadata + ISBN/subject/authorBibliographic
Wikipedia ScraperArticle + sections + referencesEncyclopedic
arXiv Paper ScraperAcademic preprintsResearch
[Google Books style — request a custom build via email]

All 31 published actors free to inspect on Apify Store.


Disclaimer

Scrapes the publicly accessible OpenLibrary API endpoints. Respects polite delays (200-500ms between requests). Not affiliated with the Internet Archive or OpenLibrary.

Honest disclosure: search-mode 22 base fields (up to 25 with description + 2 metadata fields), isbn-mode 10 base, subject-mode 10 fields, author-mode 7 base. Subject and author endpoints server-capped at 50 records regardless of maxBooksPerSource. Outer try/catch — single search-API failure halts the entire run. Single-attempt fetch, no retry/no proxy. Author-mode uses limit=1 for disambiguation — first match wins.