Goodreads Scraper — Books, Reviews, Authors, Lists avatar

Goodreads Scraper — Books, Reviews, Authors, Lists

Pricing

from $5.00 / 1,000 results

Go to Apify Store
Goodreads Scraper — Books, Reviews, Authors, Lists

Goodreads Scraper — Books, Reviews, Authors, Lists

Scrape Goodreads books, reviews, authors, lists, series, and search results from any URL or text query. MCP-ready, all-in-one, residential proxy default, $0.005 per result.

Pricing

from $5.00 / 1,000 results

Rating

0.0

(0)

Developer

Khadin Akbar

Khadin Akbar

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

a day ago

Last modified

Share

Goodreads All-in-One Scraper

Scrape Goodreads books, reviews, authors, lists, series, and search results — from any Goodreads URL or text query — in one actor.

Drop in a book URL, an author URL, a list URL, a series URL, a search URL, or just a plain text query like sapiens yuval noah harari. The actor auto-detects the input type and returns clean, flat JSON: book metadata with ISBN-13 and ratings distribution, full review text with reviewer profiles, author bios with bibliography, series in reading order, and ranked list / search results.

Built MCP-first for AI agents (Claude, ChatGPT, Gemini). Returns one record per entity with a stable itemType discriminator so an agent can route results without parsing surprises.

When to use it

  • Authors / publishers — pull every review on a comp title, slice by 1–2 stars to surface complaints, slice by 4–5 stars for ad copy quotes.
  • AI book recommenders — ingest ratings, genres, descriptions, and similar-book signals to build personalized lists.
  • Market researchers — track ranking and rating movement on genre lists ("Best Thrillers of 2025") across time.
  • Librarians and researchers — bulk-pull series, awards, and edition metadata.
  • Book marketers — discover top-ranked titles in a niche, find audience overlap via shelf tags.

When NOT to use it

  • Private user shelves that require login → not supported.
  • Goodreads librarian features (edit metadata, merge editions) → not a scraper job.
  • Real-time review streaming → run on a schedule instead.

What you get per record

itemTypeKey fields
booktitle, author, authorUrl, averageRating, ratingsCount, reviewsCount, isbn, isbn13, asin, pages, publisher, publishedAt, language, format, genres[], description, imageUrl, series, awards[], ratingsDistribution
reviewreviewerName, reviewerUrl, rating (1–5), reviewText, reviewDate, likesCount, commentsCount, shelves[], plus title + author of the book
authortitle (author name), description (bio), averageRating, ratingsCount, imageUrl
seriesper-book entries with position, title, author, averageRating, ratingsCount
list_entryper-book entries from a Goodreads list (e.g. "Best Books of 2024"), with position
search_resultper-book search match with position, title, author, averageRating

Pricing

  • $0.00005 per actor start (charged once per run, per GB of RAM).
  • $0.005 per result pushed to the dataset.

Typical agent run (one book + 10 reviews): $0.055. A list of 50 entries: $0.25. No setup fee, no monthly minimum.


Input

FieldTypeRequiredDescription
targetsstring[]Goodreads URLs OR free-text queries. Mix freely.
resultsPerTargetintegerMax records per target (default 50, max 500). For books this caps reviews.
scrapeReviewsbooleanWhen true (default), book targets also pull reviews up to resultsPerTarget.
reviewsLanguageenumall, en, es, fr, de, it, pt, nl, ru, ja, zh, ko. Default all.
minRatingintegerSkip reviews below this 1-5 rating. Default 1.
maxRatingintegerSkip reviews above this 1-5 rating. Default 5.
responseFormatenumdetailed (default, every field) or concise (token-efficient for AI agents).
proxyConfigurationobjectDefaults to Apify residential — recommended.

Example: book + reviews

{
"targets": ["https://www.goodreads.com/book/show/40097951-the-silent-patient"],
"resultsPerTarget": 30,
"scrapeReviews": true,
"minRating": 1,
"maxRating": 2
}

Returns one book record + up to 30 review records filtered to 1–2 star (negative reviews only).

Example: text query → top matches

{
"targets": ["sapiens yuval noah harari"],
"resultsPerTarget": 10
}

Returns 10 search_result records ranked by Goodreads.

Example: author + their best-rated books

{
"targets": [
"https://www.goodreads.com/author/show/3389.Brandon_Sanderson",
"https://www.goodreads.com/list/show/1043.Best_Epic_Fantasy"
],
"resultsPerTarget": 25
}

Returns one author record + up to 25 list_entry records.


Calling from code

Node.js (Apify SDK)

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('khadinakbar/goodreads-all-in-one-scraper').call({
targets: ['https://www.goodreads.com/book/show/40097951'],
resultsPerTarget: 50,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
client = ApifyClient(token="<APIFY_TOKEN>")
run = client.actor("khadinakbar/goodreads-all-in-one-scraper").call(run_input={
"targets": ["sapiens yuval noah harari"],
"resultsPerTarget": 10,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

Claude / MCP client

The actor is exposed as the MCP tool apify--goodreads-all-in-one-scraper. Sample call:

Use the Goodreads All-in-One Scraper to fetch the top 5 negative reviews
(rating <= 2) of "The Silent Patient" so I can identify common complaints.

Claude will compose the input automatically.


Output shape (concise mode)

{
"itemType": "book",
"goodreadsId": "40097951",
"url": "https://www.goodreads.com/book/show/40097951",
"title": "The Silent Patient",
"author": "Alex Michaelides",
"averageRating": 4.16,
"ratingsCount": 3430997,
"reviewsCount": 295287,
"isbn13": "9781250301697",
"pages": 336,
"publishedAt": "2019-02-05",
"genres": ["Thriller", "Mystery", "Fiction"],
"scrapedAt": "2026-05-13T21:09:00.000Z"
}

Use responseFormat: "detailed" for every parsed field (publisher, language, format, description, image URL, awards, ratings distribution, series link, etc.).


How it works

  1. Each target is classified — book / author / list / series / search URL / text query — by URL pattern.
  2. A CheerioCrawler fetches each page through Apify residential proxies with a session pool that retires sessions on 403/429.
  3. Book pages embed a Next.js Apollo JSON blob; the actor parses that for structured fields and falls back to CSS selectors on legacy templates.
  4. Reviews are extracted from the same Apollo blob (first ~30) and paginated via ?page=N for more.
  5. Records are flat-shaped, charged per result, and pushed to the dataset.

Reliability and anti-bot

  • Apify residential proxies are enabled by default. Goodreads occasionally tightens anti-bot — residential keeps success rate above 95%.
  • Session pool: 25 sessions, each retired after 30 uses or 3 errors.
  • Five retries per request with exponential backoff.
  • Failed requests are logged but never silently dropped — partial results are pushed and a SUMMARY is written to the run KV store.

FAQ

Does the actor need a Goodreads login? No. Everything it reads is publicly accessible. Private shelves and friend feeds require login and are explicitly out of scope.

Why didn't I get all the reviews I asked for? Goodreads only renders ~30 reviews per page; many books have hundreds of thousands. Set resultsPerTarget to the max you actually need — runs scale linearly in pages and proxy cost.

Can I filter reviews by date? Not directly; filter the dataset downstream by reviewDate. Goodreads' sort order is "default" (relevance), which is non-chronological by design.

What about Goodreads' official API? Deprecated in 2020. Scraping is the only practical path for current Goodreads data.

How fresh is the data? Live — each run fetches Goodreads in real time. No cached layer.

Can I export to CSV / Excel? Yes. Open the run on Apify Console → Storage → Dataset → Export → CSV / XLSX / JSON.

Does it work for non-English Goodreads regions? Yes. Goodreads serves the same canonical URL globally. Use the reviewsLanguage filter to slice review content.

What's the difference between this and the other Goodreads actors on the Store? This is the only one that accepts every entity type in a single actor with auto-detection, ships MCP-ready descriptions for AI agents, and runs on residential proxies by default for 95%+ reliability.

Changelog

  • 1.0 (2026-05-13) — Initial release. Books, reviews, authors, lists, series, search; auto-detect targets; concise / detailed response modes; MCP-first descriptions; PPE pricing.

This actor scrapes only publicly accessible Goodreads pages — content that any unauthenticated browser can view. You are responsible for ensuring your use of the data complies with Goodreads' Terms of Service, applicable copyright law (review text is copyrighted by the original reviewer; only fair-use excerpts and quantitative analysis are recommended), GDPR / CCPA when handling reviewer identifiers, and your contractual obligations to downstream consumers. Goodreads is a trademark of Goodreads LLC (Amazon). This actor is not affiliated with, endorsed by, or sponsored by Goodreads LLC. Use responsibly.