Goodreads Scraper - Books, Authors, Ratings, ISBN & Reviews avatar

Goodreads Scraper - Books, Authors, Ratings, ISBN & Reviews

Pricing

Pay per event

Go to Apify Store
Goodreads Scraper - Books, Authors, Ratings, ISBN & Reviews

Goodreads Scraper - Books, Authors, Ratings, ISBN & Reviews

Scrape Goodreads books, authors and lists. Title, ISBN, pages, format, language, rating, ratings count, reviews count, author. HTTP only, $5/1K.

Pricing

Pay per event

Rating

0.0

(0)

Developer

deusex machine

deusex machine

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

10 hours ago

Last modified

Share

Goodreads Scraper — Books, Authors, Ratings, ISBN & Reviews

Scrape Goodreads — the world's largest book community with 150M members and 4 billion+ ratings — and extract complete book metadata, author profiles and curated book lists. HTTP-only, no browser, $5 per 1,000 items ($0.005 each).

If you build an author platform, run a publishing house, sell book-discovery apps, analyze the literary market, write academic papers about reading trends, or train a recommendation model on book data, this Goodreads scraper turns the canonical book-ratings graph into a clean structured feed in seconds.

Why use this Goodreads scraper

Goodreads is the definitive book database — every English-language book published since 2007 lives here, with millions of user ratings and reviews per major title. But Goodreads has no public API since 2020 (Amazon shut down the legacy API and never replaced it), and they actively block headless scraping.

This actor extracts data from the canonical JSON-LD blocks Goodreads ships on every book detail page. That means:

  • Stable selectors<script type="application/ld+json"> is part of SEO and Goodreads cannot remove it without losing search engine ranking
  • Complete fields — title, ISBN, ISBN-13, page count, format, language, description, cover, author(s) with URLs, average rating, ratings count, reviews count
  • No anti-bot encountered on book, author and list pages
  • Fast — typically 4–6 books per second per worker
  • Cheap — $0.005 per record ($5 per 1,000), the lowest among book-database scrapers in the Apify Store

What this Goodreads scraper extracts

Per book (/book/show/...)

FieldDescriptionExample
bookIdGoodreads internal book ID16299
slugURL-safe title slugAnd_Then_There_Were_None
urlCanonical book URLhttps://www.goodreads.com/book/show/16299...
titleTitle as displayed on GoodreadsAnd Then There Were None
authorsArray of {name, url} — supports multi-author books[{name: "Agatha Christie", url: "..."}]
isbnGoodreads-canonical ISBN9780312330873
numberOfPagesPage count (integer)264
bookFormatHardcover, Paperback, Kindle, Audible, etcPaperback
languageEdition languageEnglish
descriptionMarketing description (from OG tag)"First, there were ten—a curious assortment..."
coverImageHigh-resolution cover image URLhttps://m.media-amazon.com/images/...
ratingAverage rating (1.00–5.00)4.27
ratingsCountTotal number of ratings1,662,794
reviewsCountTotal written reviews86,031
scrapedAtISO 8601 UTC timestamp2026-05-18T20:34:12+00:00

Per author (/author/show/...)

FieldExample
authorId123715
nameAgatha Christie
born / diedSeptember 15, 1890 / January 12, 1976
website / twitterauthor's social presence
genres["Mystery", "Fiction", "Crime"]
avgRating / ratingsCountaggregate across all the author's books
imageauthor photo URL
booksup to 30 visible works [{title, url}] from the profile page
booksCountlength of books

Per list (/list/show/...)

FieldExample
listId / slug1 / Best_Books_Ever
titleBest Books Ever
descriptionMarketing copy of the list
books[{bookId, title, url}] array (up to 100 by default)
booksCountlength of books

If you enable enrichBooksFromLists: true, every book referenced in the list is also fetched individually and emitted as a separate type: "book" record with full metadata (ISBN, page count, rating, etc).

Use cases for this Goodreads data API

📚 Author platforms, book promotion, indie publishing

Tools like Reedsy, BookBub, BookFunnel and indie publishing platforms need fresh rating/review metrics for every book they promote. Schedule this scraper weekly to refresh your reviews-engine.

🛒 Book discovery / recommendation apps

Train collaborative-filtering models on Goodreads ratings or build a "Books like X" feature by pulling all books from canonical lists ("Best Mystery", "Best of 2025") and ranking by rating × ratingsCount.

🎓 Academic literary research

Researchers studying genre evolution, demographic reading patterns or literary canon formation use Goodreads as primary corpus. Bulk-extract one book per year per genre and feed into your analysis pipeline.

📊 Publishing house competitive intelligence

Knowing the rating curve of every Stephen King vs every Dean Koontz vs every Lee Child release lets editors and marketing teams price advances, plan releases and pick mid-list bets.

🤖 LLM training data + RAG pipelines

Build a book-aware AI assistant that knows ISBN, page count, average rating and category for every published title — and can recommend books based on user preferences with grounded data.

✍️ Newsletter / content marketing

Subscriptions like "5-Bullet Book Brief" use book data to build reading lists for paid subscribers. This scraper feeds your CMS with consistent metadata.

📈 Financial / market analysis

Hedge funds tracking the "audiobook revolution" or "Kindle Unlimited churn" use Goodreads engagement metrics (ratings velocity, review counts) as leading indicators for traditional publisher earnings.

How to use this Goodreads scraper

Three input modes — combine them freely in a single run.

Mode 1: Book URLs

Pass canonical book URLs to extract one full record per book.

{
"bookUrls": [
"https://www.goodreads.com/book/show/16299.And_Then_There_Were_None",
"https://www.goodreads.com/book/show/40961427-educated"
]
}

Mode 2: Author URLs

Pass author profile URLs to extract author identity plus visible book list.

{
"authorUrls": [
"https://www.goodreads.com/author/show/123715.Agatha_Christie",
"https://www.goodreads.com/author/show/16667.Isaac_Asimov"
]
}

Mode 3: List URLs (with optional enrichment)

Lists are curated collections — "Best Books Ever", "Pulitzer Prize Winners", "Best Science Fiction of the Decade", etc. Each list yields ~100 books per page.

{
"listUrls": [
"https://www.goodreads.com/list/show/1.Best_Books_Ever",
"https://www.goodreads.com/list/show/2.Best_Books_of_the_Decade__2010s"
],
"enrichBooksFromLists": true,
"maxBooksPerList": 50
}

When enrichBooksFromLists: true, each list emits one type: "list" record plus one type: "book" record per enriched book. If you only need the URL references, leave it off and you'll get a much cheaper run.

Step-by-step tutorial — your first Goodreads run in 2 minutes

  1. Click "Try for free" on this actor's Apify Store page. New users get $5 in credit.
  2. Paste a starter input for the most popular Goodreads list:
    {
    "listUrls": ["https://www.goodreads.com/list/show/1.Best_Books_Ever"],
    "enrichBooksFromLists": true,
    "maxBooksPerList": 20,
    "maxTotalItems": 25
    }
  3. Click "Start" and watch the live log.
  4. Download your dataset as JSON, CSV, Excel, RSS or HTML.

You'll get one list record + 20 fully enriched book records (ISBN, ratings, page count) in ~30 seconds.

Performance and cost

  • HTTP only — no Playwright, no proxy, runs on minimal Apify compute units.
  • 4–6 items per second sustained, single worker, 256 MB memory.
  • Pricing: $0.005 per item + $0.00005 per actor start.

Pricing scenarios

WorkloadItemsCost
Try the actor5 books$0.025
One Apify free $5 credit~1,000 items$5.00
Full enrich of a 100-book list101 items$0.51
Top 10 lists × 100 books × enrich1,010 items$5.05
Author + their 30 visible books × 50 authors1,550 items$7.75

Output example (single book, JSON)

{
"type": "book",
"bookId": "16299",
"slug": "And_Then_There_Were_None",
"url": "https://www.goodreads.com/book/show/16299.And_Then_There_Were_None",
"title": "And Then There Were None",
"authors": [
{"name": "Agatha Christie", "url": "https://www.goodreads.com/author/show/123715.Agatha_Christie"}
],
"isbn": "9780312330873",
"numberOfPages": 264,
"bookFormat": "Paperback",
"language": "English",
"description": "First, there were ten—a curious assortment of strangers...",
"coverImage": "https://m.media-amazon.com/images/S/compressed.photo.goodreads.com/books/1638425885i/16299.jpg",
"rating": 4.27,
"ratingsCount": 1662794,
"reviewsCount": 86031,
"bestRating": 5.0,
"worstRating": 1.0,
"scrapedAt": "2026-05-18T20:34:12+00:00"
}

How this Goodreads scraper compares

ApproachProsCons
This actorStable JSON-LD selectors, $5/1K, no proxy, 3 modesNo full review text extraction in v1
Goodreads legacy APIWas freeShut down by Amazon in late 2020 — no longer accessible
Open Library APIFreeSparse coverage, missing ratings, no Goodreads-specific metrics
Manual scraping with BeautifulSoupTotal flexibilitySelectors break with every Goodreads UI update; you maintain forever
Hiring a freelancerCustom output$300–$1,000 one-off; not maintained

How to call this Goodreads scraper from your code

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("makework36/goodreads-scraper").call(run_input={
"listUrls": ["https://www.goodreads.com/list/show/1.Best_Books_Ever"],
"enrichBooksFromLists": True,
"maxBooksPerList": 100,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
if item["type"] == "book":
print(item["title"], item["rating"], item["isbn"])

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('makework36/goodreads-scraper').call({
bookUrls: ['https://www.goodreads.com/book/show/16299.And_Then_There_Were_None'],
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(b => console.log(b.title, b.rating, b.numberOfPages));

Frequently Asked Questions about scraping Goodreads

This actor extracts metadata Goodreads itself renders publicly to every visitor on book, author and list pages, and that is embedded as JSON-LD specifically to help search engines and aggregators consume the same data. You are still responsible for how you use it — respect copyright (descriptions and cover art remain Amazon/publisher property), Goodreads' Terms of Service for derivative-product creation, and applicable data protection laws.

Why not use the official Goodreads API?

Amazon shut down the legacy Goodreads API in late 2020 and never published a replacement. JSON-LD scraping is currently the only programmatic path to fresh Goodreads data.

Does the scraper extract full review text?

Not in v1. Aggregate counts (ratingsCount, reviewsCount) plus average rating come from JSON-LD. Individual review text is on a separately rendered page; an enhancement is planned in v1.1.

How current is the rating data?

Live — every run hits Goodreads directly and reflects the rating as displayed at request time. There is no internal cache.

What if a book has no ISBN listed?

Goodreads stores the ISBN of the default edition for each work. Some older or self-published books have no canonical ISBN — the field returns null. ISBN-13 can usually be derived from the canonical edition URL slug.

Can I scrape every book by an author?

Pass the author URL, and the actor returns the visible books on their profile page (typically 30 entries). For deeper coverage, scrape the author's "Books" tab URL pattern: https://www.goodreads.com/author/list/{authorId} (coming in v1.1).

Does Goodreads block bots?

The book, author and list endpoints do not actively challenge bot traffic as of this release. The /search?q=... endpoint does return HTTP 202 to non-cookied requests, which is why this actor does not offer keyword search mode. Use lists or direct URLs instead.

How do I find list URLs?

Visit https://www.goodreads.com/list and browse by genre, decade, or theme. Copy the URL of any list. Popular starters: /list/show/1.Best_Books_Ever, /list/show/2.Best_Books_of_the_Decade__2010s, /list/show/43.Best_Science_Fiction_Fantasy_Books.

Can I schedule this scraper?

Yes. Use Apify's built-in scheduler to refresh your dataset daily, weekly or monthly, and push results directly to Google Sheets, BigQuery, Postgres or your CMS via Apify integrations.

Will my dataset have duplicates?

The actor deduplicates by URL within a single run. Across runs, build a primary key on bookId / authorId / listId to merge cleanly.

How accurate is the page count?

Page count reflects the default edition. A Kindle edition may show a different page count than the paperback. Use bookFormat to disambiguate.

Is there a free trial?

Apify gives every new user $5 in platform credit, enough to extract ~1,000 Goodreads items with this actor.

Can I use this data to build a recommendation engine?

Absolutely. Many recommender-system projects start with a Goodreads books CSV. Combining bookId, authors, numberOfPages, language, rating, ratingsCount and description gives you a rich feature matrix for collaborative filtering or content-based recommendations.

🔗 Other actors by makework36

Building a content, publishing or recommendation product? You'll also want these:

See all actors by makework36 on the Apify Store.

Roadmap

  • v1.1: full review text extraction per book, deeper author bibliography via /author/list/{id} pagination.
  • v1.2: book genre/shelf hierarchy extraction.
  • v1.3: ISBN-to-book reverse lookup via /search/?q={isbn}&search_type=isbn.
  • v2: similar-books graph extraction (for recommendation pipelines).

Disclaimer

This actor extracts public book and author metadata that Goodreads itself renders to every visitor and embeds as JSON-LD for SEO consumption. You are responsible for how you store, transform and redistribute the data. Cover images and book descriptions remain the property of their original publishers. This actor is not affiliated with Goodreads or Amazon.

🙏 Ran this Goodreads scraper successfully? Leaving a review helps the Apify algorithm surface this actor to other book platforms and publishing teams. Much appreciated.