Open Library Scraper — Book Metadata in Bulk
Pricing
Pay per event
Open Library Scraper — Book Metadata in Bulk
Search the Open Library API (the Internet Archive's open book catalogue) and export structured book metadata — title, authors, ISBNs, subjects, publish year, cover URL, edition count, OpenLibrary ID — to JSON or CSV. We handle pagination and retries across 30M+ works.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
20 hours ago
Last modified
Share
🎯 What this scrapes
Open Library is the Internet Archive's catalogue of 30M+ works — the open, canonical bibliographic source that most reliable book-metadata pipelines lean on. When Goodreads shut their developer API in 2020, they left a gap that five years later developers are still Googling around. Open Library fills it: no licensing hurdles, no API key friction, free bulk export — if you can navigate the pagination and handle the upstream's occasional rate-limiting.
This Actor turns a free-form query (title, author, ISBN, subject) into typed dataset rows with cover URL, subjects, edition count, and the canonical Open Library key. We pace requests against the upstream, retry on transient errors, and surface partial successes loudly — so your library, recommender, or research dataset gets the rows it expects.
🔥 What we handle for you
- 🛡️ Browser fingerprint rotation —
curl-cffiimpersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per page,Retry-Afterhonoured. - 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
- 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.
💡 Use cases
- Goodreads alternative API — rebuild the bibliographic data layer Goodreads took away in 2020. Title, authors, ISBNs, subjects, cover URL, edition count — the fields every book-rec app needs.
- ISBN lookup at bulk scale — enrich a CSV of book titles with ISBNs, authors, and covers in one run. Better unit economics than a per-request ISBN lookup API.
- Free book metadata API — feed a reading-list dashboard, a library catalogue app, or a fiction-RAG backend with structured Open Library data. No licensing restrictions on bibliographic metadata.
- Discovery pipelines — list every Asimov novel + edition count for a fan-site backend, or enumerate every title tagged "machine learning" for a curated reading list.
- Digital humanities — seed subject-tag corpora for distant-reading research, cultural-analytics, or AI-tutor curriculum ingestion.
⚙️ How to use it
- Click Try for free at the top of the page.
- Fill in the input form — most fields have sensible defaults.
- Click Start. Output streams into the run's dataset.
- Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the API.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
searchQuery | string | yes | 'isaac asimov foundation' | Free-text search. Open Library matches across title, author, subject, ISBN. |
searchField | string | no | 'all' | Narrow which field your query targets. all matches everywhere. |
maxResults | integer | no | 30 | Max books to return. API caps per page at 100; we paginate. |
language | string | no | '' | 3-letter ISO-639-2 code, e.g. eng, spa, fre. Leave empty for all. |
proxyConfiguration | object | no | {'useApifyProxy': False} | Open Library is open. Proxy optional. |
Example input
{"searchQuery": "foundation asimov","searchField": "all","maxResults": 3,"proxyConfiguration": {"useApifyProxy": false}}
📤 Output
Every row is one dataset item.
| Field | Type | Notes |
|---|---|---|
openlibrary_key | string | Open Library work key (e.g. /works/OL12345W). |
title | string | Work title. |
subtitle | ['string', 'null'] | Subtitle, when present. |
authors | array | Author names. |
first_publish_year | ['integer', 'null'] | Earliest publication year recorded. |
edition_count | integer | Number of editions Open Library tracks. |
languages | array | Language codes detected across editions. |
subjects | array | Subject tags (up to 30, truncated). |
isbns | array | ISBNs detected (10 and 13). |
publishers | array | Publishers across editions (deduped). |
cover_id | ['integer', 'null'] | Open Library cover image ID. |
cover_url_l | ['string', 'null'] | Large cover image URL. |
ratings_average | ['number', 'null'] | Average rating where Open Library has one. |
ratings_count | ['integer', 'null'] | Rating count. |
ebook_access | ['string', 'null'] | Open Library's e-book availability — public, borrowable, no_ebook, printdisabled. |
work_url | string | Canonical Open Library URL. |
scraped_at | string | When this row was recorded. |
Example output
{"openlibrary_key": "/works/OL471576W","title": "Foundation","authors": ["Isaac Asimov"],"first_publish_year": 1951,"edition_count": 142,"work_url": "https://openlibrary.org/works/OL471576W"}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | USD | What it is |
|---|---|---|
actor-start | $0.005 | One-off warm-up charge per run |
result | $0.0015 | Per dataset item |
Example: 1 000 results at the rates above ≈ $1.50. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.
🚧 Limitations
- Search uses Open Library's relevance ranking — for canonical bibliographic data (LCSH/Dewey), use a dedicated MARC source. Subjects are tags, not curated taxonomies.
- This Actor exports metadata only — titles, ISBNs, authors, subjects, cover URLs, publish years. It does not download book text or full-text content. For public-domain full-text, follow
work_urlto the Internet Archive reader. - Open Library has thinner rating data than Goodreads. Treat
ratings_averagewith caution for niche or older works.
❓ FAQ
Is this a Goodreads alternative API?
Yes, for the bibliographic data layer. Goodreads shut their developer API in December 2020. Open Library provides the same core fields — title, authors, ISBNs, subjects, cover URL, edition count — under a fully open licence. This Actor is the managed bulk-export layer on top of that catalogue.
Can I do ISBN lookup in bulk?
Yes. Pass searchField: "isbn" with a specific ISBN-10 or ISBN-13 as searchQuery, or use searchField: "all" with a title + author combination to retrieve ISBNs at scale. Each result row returns the full isbns array for all editions of a work.
Where's the book description / blurb?
The search API doesn't include long descriptions; for those, follow up with /works/{key}.json. We surface enough to enrich a catalogue or recommendation engine.
Why are some ISBNs missing?
Older works weren't always catalogued with ISBNs. We return what Open Library has.
Can I download the book text?
Not via this Actor — we export metadata only. Visit work_url and follow Open Library's reader flow for public-domain full text.
What about the Open Library API directly?
Open Library's /search.json endpoint is public, but handling pagination, rate-limit pacing, retries, and clean typed output at scale is the work this Actor absorbs. We handle the blocks so you get consistent rows.
Is the data licensed for commercial use?
Open Library's bibliographic metadata is released under CC0 (public domain). Always verify the licence terms for your specific use case at openlibrary.org.
💬 Your feedback
Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.