Goodreads Book Scraper
Pricing
from $1.00 / 1,000 results
Goodreads Book Scraper
Extract book data from Goodreads: titles, authors, ratings, reviews, genres, ISBN, publisher, and more. HTTP-based, no proxy required.
Pricing
from $1.00 / 1,000 results
Rating
5.0
(23)
Developer
Crawler Bros
Actor stats
24
Bookmarked
7
Total users
2
Monthly active users
13 hours
Issues response
2 days ago
Last modified
Categories
Share
Goodreads Scraper
Scrape Goodreads — books, authors, series, Listopia lists, popular shelves, and genres. Look up books by direct URL, search query, or ISBN. Get titles, authors, ratings, reviews, genres, ISBN-10/13, publisher, page count, language, format, cover image, and more. HTTP-based via the public goodreads.com pages. No proxy required, no authentication.
New in v1.1: ISBN lookup, author/series/list/shelf/genre modes, omit-empty output (no null fields), retry layer, optional metadata records.
What this actor does
- 9 modes:
auto(default),books,search,isbns,authors,series,lists,shelves,genres. - ISBN lookup — paste ISBN-10 or ISBN-13; Goodreads' search redirects to the matching book.
- Listings — author / series / list / shelf / genre pages walked (paginated where applicable) and every book scraped in detail.
- Filters — minimum rating, minimum ratings count, publish year range, language, contains-genre, contains-author.
- Metadata records — optionally emit per-author / per-series / per-list summary records (bio, description, book count) alongside book records.
- Empty fields are omitted — records never contain
null,"",[], or{}.
Quick start
The default mode: "auto" runs every populated input array. Provide what you have:
{"mode": "auto","bookUrls": ["https://www.goodreads.com/book/show/4671.The_Great_Gatsby"],"searchQueries": ["sapiens"],"maxItems": 5}
Expected output: 1 book record (Gatsby) + up to 5 books from the Sapiens search.
Modes
auto (default)
Runs whichever input arrays are non-empty. Fully backward-compatible with v1.0 (which only had bookUrls + searchQueries).
books — direct URLs
{"mode": "books","bookUrls": ["https://www.goodreads.com/book/show/4671.The_Great_Gatsby","https://www.goodreads.com/book/show/23692271-sapiens"]}
search — text queries
{"mode": "search","searchQueries": ["atomic habits", "the lord of the rings"],"maxItems": 20}
isbns — ISBN-10 or ISBN-13
{"mode": "isbns","isbns": ["9780743273565", "0747532699", "9780261103252"]}
Hyphens and spaces in ISBN strings are tolerated ("978-0-7432-7356-5" works). Goodreads redirects ISBN searches directly to the matching book — no extra fetch needed.
authors — all books by an author
{"mode": "authors","authorUrls": ["https://www.goodreads.com/author/show/1077326.J_K_Rowling"],"maxItems": 20,"includeMetadata": true}
Walks /author/list/<id>?page=N for every book. With includeMetadata: true, also emits one author summary record (bio, photo, genres, born/died) before the books.
series — all books in a series
{"mode": "series","seriesUrls": ["https://www.goodreads.com/series/49075-harry-potter"],"includeMetadata": true}
lists — Listopia curated lists
{"mode": "lists","listUrls": ["https://www.goodreads.com/list/show/1.Best_Books_Ever"],"maxItems": 50}
Paginated automatically. Try "https://www.goodreads.com/list/show/264.Books_That_Should_Be_Made_Into_Movies" or any Listopia URL.
shelves — popular shelf names
{"mode": "shelves","shelfNames": ["mystery", "fantasy", "historical-fiction"],"maxItems": 30}
Shelf names are case-insensitive; spaces are converted to hyphens (e.g. "Best Mystery 2024" → "best-mystery-2024").
genres — top books in a genre
{"mode": "genres","genreNames": ["fiction", "romance", "mystery"]}
Filters
All filters are optional. They apply across every mode. Records missing the filtered field pass through (filters reject only when the field is present and out of bounds).
| Filter | Type | Effect |
|---|---|---|
minRating | number 0–5 | Drop books with averageRating below this |
minRatingsCount | int | Drop books with fewer ratings than this |
publishYearMin | int | Drop books published before this year |
publishYearMax | int | Drop books published after this year |
language | string | Records whose language starts with this string (case-insensitive) — e.g. "en" matches "en" / "eng" / "English" |
containsGenre | string | Drop books unless one of their genres contains this substring (case-insensitive) |
containsAuthor | string | Drop books unless one of their authors contains this substring (case-insensitive) |
Example: highly-rated recent fantasy
{"mode": "shelves","shelfNames": ["fantasy"],"minRating": 4.0,"minRatingsCount": 10000,"publishYearMin": 2020,"maxItems": 50}
Output fields per record type
Every record has recordType: "book" | "author" | "series" | "list" and scrapedAt (ISO 8601 UTC).
Book record (recordType: "book")
| Field | Description |
|---|---|
title | Book title |
url | Goodreads book URL |
bookId | Goodreads numeric book ID |
authors[] | Author names |
primaryAuthor | First author |
authorUrls[] | Goodreads author profile URLs |
description | Plain-text description (HTML stripped) |
isbn, isbn10, isbn13 | ISBN identifiers (when known) |
averageRating | Average rating, 0–5 |
ratingsCount | Total number of ratings |
reviewsCount | Total number of text reviews |
pagesCount | Page count |
publishedYear | Year of original publication |
publisher | Publisher name |
language | Language (varies — sometimes ISO code, sometimes name) |
format | Paperback, Hardcover, Kindle, etc. |
genres[] | List of genre tags |
coverImage | Cover image URL on Goodreads CDN |
Author record (recordType: "author", only when includeMetadata: true)
| Field | Description |
|---|---|
name | Author display name |
authorId, authorUrl | Goodreads identifiers |
photoUrl | Author photo on Goodreads CDN |
description | "About the author" text |
born, died | Birth/death info (when public) |
genres[] | Top author genres |
website | External author website (when listed) |
Series record (recordType: "series", only when includeMetadata: true)
| Field | Description |
|---|---|
name | Series name |
seriesId, seriesUrl | Goodreads identifiers |
description | Series description |
primaryAuthor | First author of the series |
bookCount | Number of books in the series page |
List record (recordType: "list", only when includeMetadata: true)
| Field | Description |
|---|---|
name | List name |
listId, listUrl | Goodreads identifiers |
description | List description |
bookCount | Total books in the list |
voterCount | Total voters |
Use cases
- Library systems — bulk-import metadata from Goodreads by ISBN.
- Reading recommendation — feed Goodreads genre + rating data into your recommender.
- Author catalog — get every book by a specific author in one run.
- Series tracking — pull all books in a series with publication years and ratings.
- Curated discovery — scrape Listopia lists like "Best Books of the Decade" or "Best Mystery 2024".
- Reading-level filtering — only books rated ≥4.0 with ≥10k ratings published in the last 5 years.
- Publishing intelligence — track ratings/reviews velocity for a series of releases.
FAQ
Why was ISBN lookup added in v1.1?
A user reported that v1.0 had no documented path to look up a book by ISBN. v1.1 ships a dedicated isbns input that accepts ISBN-10 / ISBN-13 with or without hyphens. Goodreads' search endpoint redirects ISBN queries to the matching book page, so lookup is direct.
Is a proxy required?
No. Goodreads' public pages are accessible from datacenter IPs. The actor includes an optional proxyConfiguration field for cases where you hit sustained 429s, but the default is no proxy.
What's the rate limit?
The actor uses 0.3–0.7s polite delays between fetches. If Goodreads rate-limits, the actor retries with exponential backoff (10s/20s/40s, capped at 90s) up to 3 times per fetch.
Why is includeMetadata opt-in?
By default the dataset is uniform (all recordType: "book"). Enabling includeMetadata mixes in author / series / list records — useful for analysis but breaks consumers expecting only books. Off by default for backward compatibility.
Why does some coverImage URL return 404?
Goodreads sometimes references covers that aren't in their CDN (very old or rare books). The URL is what Goodreads publishes; not all of them resolve.
What does mode: "auto" mean?
It runs every populated input array sequentially. This is the default, and it preserves v1.0 behavior — pre-v1.1 callers passing only bookUrls and searchQueries continue to work unchanged.
What's the difference between language: "en" and language: "English"?
Both match. The filter is a case-insensitive prefix match — "en" matches "en", "eng", and "English" (which all start with "en"). "English" matches only "English" exactly (and any value starting with "English").
Can I use ISBN-13 or ISBN-10? Both. The actor normalizes by stripping non-alphanumerics; hyphens and spaces in your input are fine.
Is this affiliated with Goodreads or Amazon? No. This is a third-party actor that uses Goodreads' public pages.
Limitations (v1.1)
- Reviews-detail pages (
/review/show/<id>) are not scraped. The book'sreviewsCountis captured, but individual review text is not. Planned for v2. - Award pages (
/award/show/<id>) have inconsistent layouts and are not supported. Planned for v2. - Quotes (
/quotes) are a separate record type and not in scope for v1.1. - User shelves (
/user/show/<id>) are not supported — most are login-gated; public shelves duplicate/list/show. - No country localization — Goodreads runs on
goodreads.comglobally with no country subdomains. mode=shelvesis capped at ~50 books by Goodreads —/shelf/show/<name>?page=Nignores thepageparameter; pages 2+ return identical content. The actor detects this and stops cleanly. For deeper coverage, usemode=lists(Listopia paginates correctly to thousands of books) ormode=genres(different surface).
Changelog
v1.1 (current)
- NEW:
isbnsmode (the v1.0 ISBN gap is closed). - NEW:
authors,series,lists,shelves,genresmodes. - NEW: filters —
minRating,minRatingsCount,publishYearMin/Max,language,containsGenre,containsAuthor. - NEW: optional metadata records via
includeMetadata. - NEW: retry layer on 429/5xx with exponential backoff.
- FIX: omit-empty contract — records no longer contain
null,"",[], or{}. - DEPRECATED:
maxResultsPerQuery(usemaxItemsinstead — both honored). - BACKWARD-COMPAT: v1.0 callers passing only
bookUrls+searchQueriescontinue to work viamode: "auto".
v1.0
- Initial release:
bookUrls,searchQueries,maxResultsPerQuery.