OpenLibrary Books — Metadata, ISBNs, Authors, CSV, No API Key
Pricing
Pay per usage
OpenLibrary Books — Metadata, ISBNs, Authors, CSV, No API Key
19 runs. OpenLibrary metadata as CSV/JSON — titles, authors, ISBNs, subjects, languages, pageCount, coverUrl, ebookAccess, ratings. By query/ISBN/subject/author. For library cataloguing + book-rec engines + academic research. No API key. Backed by 951-run Trustpilot flagship + 31-actor portfolio.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Alex
Actor stats
0
Bookmarked
2
Total users
0
Monthly active users
6 days ago
Last modified
Categories
Share
OpenLibrary Book Scraper — Metadata, ISBNs, Authors, Subjects
Scrape book metadata from the free OpenLibrary API. No API key, no rate-limit token, no auth wall. Four input modes: search queries, ISBN lookups, subject browse, author works. Output JSON or CSV.
Built for: library data builds, reading-list automation, ISBN enrichment, book recommendation datasets, academic citation enrichment.
What this actor does (honest scope, verified against src/main.js)
Calls these public OpenLibrary endpoints under the hood:
| Input field | Endpoint hit | Returns |
|---|---|---|
searchQueries | /search.json?q=…&page=N&limit=50 | 50 docs/page, paginated until maxBooksPerSource reached or numFound <= collected |
isbns | /isbn/{isbn}.json | One book per ISBN |
subjects | /subjects/{slug}.json?limit=min(maxBooksPerSource, 50) | Hard-capped at 50 — even if you set maxBooksPerSource=200, subject browse returns at most 50 |
authors | /search/authors.json?limit=1 + /authors/{key}/works.json?limit=min(maxBooksPerSource, 50) | First author match only (no disambiguation), then up to 50 works |
Sets User-Agent: ApifyOpenLibraryScraper/1.0. Inserts polite delays between requests: 200ms after each work-description fetch, 300ms before each ISBN/subject/author lookup, 500ms between search-mode pages. If includeDescription=true (default), search-mode and isbn-mode fire one extra /works/{key}.json per book to pull the description text — slower but richer. Subject-mode and author-mode never fetch the work-description endpoint — they read whatever description is already in the listing payload.
Input parameters
| Field | Type | Default | Description |
|---|---|---|---|
searchQueries | array of strings | [] | Free-text search (title/keyword/phrase) |
isbns | array of strings | [] | ISBN-10 or ISBN-13 lookups |
subjects | array of strings | [] | Subject names — auto-lowercased and spaces replaced with underscores (e.g. "Science Fiction" → slug science_fiction). Special characters NOT escaped beyond URL-encoding — exotic subject names may 404. |
authors | array of strings | [] | Author names (e.g. "Isaac Asimov"). Only the first match is taken (limit=1) — common-name authors may resolve to a different person than expected. Use the OpenLibrary author-key directly via a custom build if disambiguation matters. |
maxBooksPerSource | integer | 50 | Cap per query/ISBN/subject/author (schema allows 1-200, but subjects and authors are server-capped at 50 regardless) |
includeDescription | boolean | true | Fetch full description (extra API call per book in search-mode and isbn-mode only) |
You can mix all four modes in a single run. Each output record carries a source field telling you which mode produced it (search:<query>, isbn:<n>, subject:<s>, author:<a>).
Output schema (varies by source mode — fields differ deliberately)
Records from different modes carry different field sets. This is by design — OpenLibrary returns richer metadata for search results than for ISBN / subject / author endpoints.
search: mode (22 base fields, +description with includeDescription, +2 metadata = up to 25)
{"title": "Foundation","authors": ["Isaac Asimov"],"authorKeys": ["OL26320A"],"firstPublishYear": 1951,"publishYears": [1951, 1952, 1955, 1962, 1974],"isbn": "9780553293357","allIsbns": ["9780553293357", "9780553382570", "..."],"subjects": ["Science fiction", "Galactic empire", "..."],"publishers": ["Bantam Spectra", "Doubleday", "..."],"languages": ["eng"],"pageCount": 244,"editionCount": 142,"coverUrl": "https://covers.openlibrary.org/b/id/9261361-L.jpg","openLibraryKey": "/works/OL46828W","openLibraryUrl": "https://openlibrary.org/works/OL46828W","ebookAccess": "borrowable","hasFulltext": true,"ratingsAverage": 4.12,"ratingsCount": 1284,"wantToRead": 8421,"currentlyReading": 412,"alreadyRead": 6203,"description": "In the waning days of a future Galactic Empire...","source": "search:foundation","scrapedAt": "2026-04-29T12:00:00.000Z"}
Field caps in search-mode: allIsbns truncated to first 10, subjects truncated to first 20, publishers truncated to first 5. pageCount is number_of_pages_median (median across editions, not the specific-edition page count).
isbn: mode (10 base fields, +description+subjects if includeDescription=true)
title, isbn, publishers (uncapped), publishDate, pageCount (specific-edition number_of_pages, NOT median), coverUrl, openLibraryKey, openLibraryUrl, source, scrapedAt. With includeDescription=true, adds description and subjects (uncapped). Description fetch is wrapped in a silent try/catch — on failure, both description and subjects are simply absent (no error field, no retry).
subject: mode (10 fields)
title, authors (array of names — different shape than search-mode's authorKeys), coverUrl, openLibraryKey, openLibraryUrl, editionCount, firstPublishYear, subject, source, scrapedAt. No ratings, no ISBN, no description in this mode — that's an OpenLibrary /subjects/ endpoint limitation, not ours. Server hard-caps to 50 records regardless of maxBooksPerSource.
author: mode (7 base fields, +description if includeDescription=true)
title, authors (1-element array with the resolved author name), authorKey (singular — different from search-mode's plural authorKeys), openLibraryKey, openLibraryUrl, covers (capped to first 3 cover URLs), source, scrapedAt. With includeDescription=true, adds description IF the author-works payload already contains it (no extra API call — purely best-effort). Server hard-caps to 50 works regardless of maxBooksPerSource.
Field-name asymmetry across modes: search-mode emits authorKeys (plural array) + coverUrl (single URL); author-mode emits authorKey (singular string) + covers (array of up to 3); subject-mode and isbn-mode emit neither. If you join across modes, normalize these explicitly.
Operational caveats
- ⚠️ Outer try/catch wraps the entire 4-mode for-loop (
src/main.jslines 57-222). ISBN, subject, and author loops have inner try/catch so individual lookup failures don't halt their batch. BUT search-mode does NOT have inner protection — a single search-API failure (e.g. transient HTTP 500, network blip) kills the run mid-stream and skips ALL remaining search queries, ISBN lookups, subject browses, and author lookups. Run problematic queries in isolation if dropout matters. - No retry / no proxy. Single
fetch()per URL. Heavy bursts may eventually trigger OpenLibrary's polite-use ceiling (~100 req/min unofficial); the actor will surface that as a thrown HTTP error. - Description-fetch silent-empty. When
includeDescription=trueand the work-page fetch fails,descriptionis set to empty string (search-mode) or absent (isbn-mode) — no error is logged per book. - Subject slug transform is naive. Input
"Science Fiction"→ slug"science_fiction". Special characters beyond letters/spaces are URL-encoded but not slug-normalized; subjects like"René Magritte's books"will likely 404.
What this actor does NOT do
- No reading-progress / personal-list scraping — OpenLibrary doesn't expose individual users' lists.
- No full-text book content — only metadata + descriptions. Read free books at openlibrary.org or via Internet Archive.
- No price comparison — OpenLibrary is metadata-only, not a bookstore.
- No deduplication across modes — if you search
"Foundation"and lookup ISBN9780553293357, you'll get 2 records. Dedupe byopenLibraryKeypost-run if needed. - No incremental crawl / cursor state — each run starts fresh from page 1.
- No author disambiguation — first match wins.
When this stops being enough
If you need book full-text → use Internet Archive. If you need real-time bookstore prices → write a separate Amazon/Bookshop scraper. If you need annotated bibliographies → look at Goodreads (no public API since 2020, harder).
Custom builds — pilot tiers
This actor runs on Apify's standard compute. If you need a custom variant — search-mode-only with retry+backoff, ISBN-bulk with deduplication, subject browse paginated past the 50-cap (via search workaround), author-key direct lookup, hourly cron, Slack alerts on new releases — three tiers:
- Pilot — $97 · 1 actor, basic config, 7-day support. Good for one-off "top 200 books in subject X" via search + subject hybrid.
- Standard — $297 · custom actor + Slack/email alerts on results, 30-day support. Most reading-list / catalog-enrichment projects fit here.
- Premium — $797 · custom actor + dashboard + 90-day support + 1 modification round. For ongoing pipelines (weekly new-release feed, ISBN-stream enrichment, author-tracking dashboards).
Email: spinov001@gmail.com — drop the input shape and the schema you need; quote within 48h.
Proof of work: 31 published Apify scrapers (78 total in portfolio) — Trustpilot 949 runs, Reddit 80+, Google News 43, Glassdoor 37, Email Extractor 36+. Recently delivered a paid 3-article series for a client in the proxy industry ($150).
More tips: t.me/scraping_ai · blog.spinov.online
Related scrapers
| Source | Actor | Data |
|---|---|---|
| OpenLibrary (this) | Book metadata + ISBN/subject/author | Bibliographic |
| Wikipedia Scraper | Article + sections + references | Encyclopedic |
| arXiv Paper Scraper | Academic preprints | Research |
| [Google Books style — request a custom build via email] | — | — |
All 31 published actors free to inspect on Apify Store.
Disclaimer
Scrapes the publicly accessible OpenLibrary API endpoints. Respects polite delays (200-500ms between requests). Not affiliated with the Internet Archive or OpenLibrary.
Honest disclosure: search-mode 22 base fields (up to 25 with description + 2 metadata fields), isbn-mode 10 base, subject-mode 10 fields, author-mode 7 base. Subject and author endpoints server-capped at 50 records regardless of maxBooksPerSource. Outer try/catch — single search-API failure halts the entire run. Single-attempt fetch, no retry/no proxy. Author-mode uses limit=1 for disambiguation — first match wins.