Goodreads Book Scraper avatar

Goodreads Book Scraper

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Goodreads Book Scraper

Goodreads Book Scraper

Extract book data from Goodreads: titles, authors, ratings, reviews, genres, ISBN, publisher, and more. HTTP-based, no proxy required.

Pricing

from $1.00 / 1,000 results

Rating

5.0

(23)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

24

Bookmarked

7

Total users

2

Monthly active users

13 hours

Issues response

2 days ago

Last modified

Share

Goodreads Scraper

Scrape Goodreads — books, authors, series, Listopia lists, popular shelves, and genres. Look up books by direct URL, search query, or ISBN. Get titles, authors, ratings, reviews, genres, ISBN-10/13, publisher, page count, language, format, cover image, and more. HTTP-based via the public goodreads.com pages. No proxy required, no authentication.

New in v1.1: ISBN lookup, author/series/list/shelf/genre modes, omit-empty output (no null fields), retry layer, optional metadata records.

What this actor does

  • 9 modes: auto (default), books, search, isbns, authors, series, lists, shelves, genres.
  • ISBN lookup — paste ISBN-10 or ISBN-13; Goodreads' search redirects to the matching book.
  • Listings — author / series / list / shelf / genre pages walked (paginated where applicable) and every book scraped in detail.
  • Filters — minimum rating, minimum ratings count, publish year range, language, contains-genre, contains-author.
  • Metadata records — optionally emit per-author / per-series / per-list summary records (bio, description, book count) alongside book records.
  • Empty fields are omitted — records never contain null, "", [], or {}.

Quick start

The default mode: "auto" runs every populated input array. Provide what you have:

{
"mode": "auto",
"bookUrls": ["https://www.goodreads.com/book/show/4671.The_Great_Gatsby"],
"searchQueries": ["sapiens"],
"maxItems": 5
}

Expected output: 1 book record (Gatsby) + up to 5 books from the Sapiens search.

Modes

auto (default)

Runs whichever input arrays are non-empty. Fully backward-compatible with v1.0 (which only had bookUrls + searchQueries).

books — direct URLs

{
"mode": "books",
"bookUrls": [
"https://www.goodreads.com/book/show/4671.The_Great_Gatsby",
"https://www.goodreads.com/book/show/23692271-sapiens"
]
}

search — text queries

{
"mode": "search",
"searchQueries": ["atomic habits", "the lord of the rings"],
"maxItems": 20
}

isbns — ISBN-10 or ISBN-13

{
"mode": "isbns",
"isbns": ["9780743273565", "0747532699", "9780261103252"]
}

Hyphens and spaces in ISBN strings are tolerated ("978-0-7432-7356-5" works). Goodreads redirects ISBN searches directly to the matching book — no extra fetch needed.

authors — all books by an author

{
"mode": "authors",
"authorUrls": ["https://www.goodreads.com/author/show/1077326.J_K_Rowling"],
"maxItems": 20,
"includeMetadata": true
}

Walks /author/list/<id>?page=N for every book. With includeMetadata: true, also emits one author summary record (bio, photo, genres, born/died) before the books.

series — all books in a series

{
"mode": "series",
"seriesUrls": ["https://www.goodreads.com/series/49075-harry-potter"],
"includeMetadata": true
}

lists — Listopia curated lists

{
"mode": "lists",
"listUrls": ["https://www.goodreads.com/list/show/1.Best_Books_Ever"],
"maxItems": 50
}

Paginated automatically. Try "https://www.goodreads.com/list/show/264.Books_That_Should_Be_Made_Into_Movies" or any Listopia URL.

{
"mode": "shelves",
"shelfNames": ["mystery", "fantasy", "historical-fiction"],
"maxItems": 30
}

Shelf names are case-insensitive; spaces are converted to hyphens (e.g. "Best Mystery 2024""best-mystery-2024").

genres — top books in a genre

{
"mode": "genres",
"genreNames": ["fiction", "romance", "mystery"]
}

Filters

All filters are optional. They apply across every mode. Records missing the filtered field pass through (filters reject only when the field is present and out of bounds).

FilterTypeEffect
minRatingnumber 0–5Drop books with averageRating below this
minRatingsCountintDrop books with fewer ratings than this
publishYearMinintDrop books published before this year
publishYearMaxintDrop books published after this year
languagestringRecords whose language starts with this string (case-insensitive) — e.g. "en" matches "en" / "eng" / "English"
containsGenrestringDrop books unless one of their genres contains this substring (case-insensitive)
containsAuthorstringDrop books unless one of their authors contains this substring (case-insensitive)

Example: highly-rated recent fantasy

{
"mode": "shelves",
"shelfNames": ["fantasy"],
"minRating": 4.0,
"minRatingsCount": 10000,
"publishYearMin": 2020,
"maxItems": 50
}

Output fields per record type

Every record has recordType: "book" | "author" | "series" | "list" and scrapedAt (ISO 8601 UTC).

Book record (recordType: "book")

FieldDescription
titleBook title
urlGoodreads book URL
bookIdGoodreads numeric book ID
authors[]Author names
primaryAuthorFirst author
authorUrls[]Goodreads author profile URLs
descriptionPlain-text description (HTML stripped)
isbn, isbn10, isbn13ISBN identifiers (when known)
averageRatingAverage rating, 0–5
ratingsCountTotal number of ratings
reviewsCountTotal number of text reviews
pagesCountPage count
publishedYearYear of original publication
publisherPublisher name
languageLanguage (varies — sometimes ISO code, sometimes name)
formatPaperback, Hardcover, Kindle, etc.
genres[]List of genre tags
coverImageCover image URL on Goodreads CDN

Author record (recordType: "author", only when includeMetadata: true)

FieldDescription
nameAuthor display name
authorId, authorUrlGoodreads identifiers
photoUrlAuthor photo on Goodreads CDN
description"About the author" text
born, diedBirth/death info (when public)
genres[]Top author genres
websiteExternal author website (when listed)

Series record (recordType: "series", only when includeMetadata: true)

FieldDescription
nameSeries name
seriesId, seriesUrlGoodreads identifiers
descriptionSeries description
primaryAuthorFirst author of the series
bookCountNumber of books in the series page

List record (recordType: "list", only when includeMetadata: true)

FieldDescription
nameList name
listId, listUrlGoodreads identifiers
descriptionList description
bookCountTotal books in the list
voterCountTotal voters

Use cases

  • Library systems — bulk-import metadata from Goodreads by ISBN.
  • Reading recommendation — feed Goodreads genre + rating data into your recommender.
  • Author catalog — get every book by a specific author in one run.
  • Series tracking — pull all books in a series with publication years and ratings.
  • Curated discovery — scrape Listopia lists like "Best Books of the Decade" or "Best Mystery 2024".
  • Reading-level filtering — only books rated ≥4.0 with ≥10k ratings published in the last 5 years.
  • Publishing intelligence — track ratings/reviews velocity for a series of releases.

FAQ

Why was ISBN lookup added in v1.1? A user reported that v1.0 had no documented path to look up a book by ISBN. v1.1 ships a dedicated isbns input that accepts ISBN-10 / ISBN-13 with or without hyphens. Goodreads' search endpoint redirects ISBN queries to the matching book page, so lookup is direct.

Is a proxy required? No. Goodreads' public pages are accessible from datacenter IPs. The actor includes an optional proxyConfiguration field for cases where you hit sustained 429s, but the default is no proxy.

What's the rate limit? The actor uses 0.3–0.7s polite delays between fetches. If Goodreads rate-limits, the actor retries with exponential backoff (10s/20s/40s, capped at 90s) up to 3 times per fetch.

Why is includeMetadata opt-in? By default the dataset is uniform (all recordType: "book"). Enabling includeMetadata mixes in author / series / list records — useful for analysis but breaks consumers expecting only books. Off by default for backward compatibility.

Why does some coverImage URL return 404? Goodreads sometimes references covers that aren't in their CDN (very old or rare books). The URL is what Goodreads publishes; not all of them resolve.

What does mode: "auto" mean? It runs every populated input array sequentially. This is the default, and it preserves v1.0 behavior — pre-v1.1 callers passing only bookUrls and searchQueries continue to work unchanged.

What's the difference between language: "en" and language: "English"? Both match. The filter is a case-insensitive prefix match — "en" matches "en", "eng", and "English" (which all start with "en"). "English" matches only "English" exactly (and any value starting with "English").

Can I use ISBN-13 or ISBN-10? Both. The actor normalizes by stripping non-alphanumerics; hyphens and spaces in your input are fine.

Is this affiliated with Goodreads or Amazon? No. This is a third-party actor that uses Goodreads' public pages.

Limitations (v1.1)

  • Reviews-detail pages (/review/show/<id>) are not scraped. The book's reviewsCount is captured, but individual review text is not. Planned for v2.
  • Award pages (/award/show/<id>) have inconsistent layouts and are not supported. Planned for v2.
  • Quotes (/quotes) are a separate record type and not in scope for v1.1.
  • User shelves (/user/show/<id>) are not supported — most are login-gated; public shelves duplicate /list/show.
  • No country localization — Goodreads runs on goodreads.com globally with no country subdomains.
  • mode=shelves is capped at ~50 books by Goodreads — /shelf/show/<name>?page=N ignores the page parameter; pages 2+ return identical content. The actor detects this and stops cleanly. For deeper coverage, use mode=lists (Listopia paginates correctly to thousands of books) or mode=genres (different surface).

Changelog

v1.1 (current)

  • NEW: isbns mode (the v1.0 ISBN gap is closed).
  • NEW: authors, series, lists, shelves, genres modes.
  • NEW: filters — minRating, minRatingsCount, publishYearMin/Max, language, containsGenre, containsAuthor.
  • NEW: optional metadata records via includeMetadata.
  • NEW: retry layer on 429/5xx with exponential backoff.
  • FIX: omit-empty contract — records no longer contain null, "", [], or {}.
  • DEPRECATED: maxResultsPerQuery (use maxItems instead — both honored).
  • BACKWARD-COMPAT: v1.0 callers passing only bookUrls + searchQueries continue to work via mode: "auto".

v1.0

  • Initial release: bookUrls, searchQueries, maxResultsPerQuery.