Goodreads Scraper — Books, Reviews, Authors, Lists
Pricing
from $5.00 / 1,000 results
Goodreads Scraper — Books, Reviews, Authors, Lists
Scrape Goodreads books, reviews, authors, lists, series, and search results from any URL or text query. MCP-ready, all-in-one, residential proxy default, $0.005 per result.
Pricing
from $5.00 / 1,000 results
Rating
0.0
(0)
Developer
Khadin Akbar
Maintained by CommunityActor stats
0
Bookmarked
1
Total users
0
Monthly active users
a day ago
Last modified
Categories
Share
Goodreads All-in-One Scraper
Scrape Goodreads books, reviews, authors, lists, series, and search results — from any Goodreads URL or text query — in one actor.
Drop in a book URL, an author URL, a list URL, a series URL, a search URL, or just a plain text query like sapiens yuval noah harari. The actor auto-detects the input type and returns clean, flat JSON: book metadata with ISBN-13 and ratings distribution, full review text with reviewer profiles, author bios with bibliography, series in reading order, and ranked list / search results.
Built MCP-first for AI agents (Claude, ChatGPT, Gemini). Returns one record per entity with a stable itemType discriminator so an agent can route results without parsing surprises.
When to use it
- Authors / publishers — pull every review on a comp title, slice by 1–2 stars to surface complaints, slice by 4–5 stars for ad copy quotes.
- AI book recommenders — ingest ratings, genres, descriptions, and similar-book signals to build personalized lists.
- Market researchers — track ranking and rating movement on genre lists ("Best Thrillers of 2025") across time.
- Librarians and researchers — bulk-pull series, awards, and edition metadata.
- Book marketers — discover top-ranked titles in a niche, find audience overlap via shelf tags.
When NOT to use it
- Private user shelves that require login → not supported.
- Goodreads librarian features (edit metadata, merge editions) → not a scraper job.
- Real-time review streaming → run on a schedule instead.
What you get per record
itemType | Key fields |
|---|---|
book | title, author, authorUrl, averageRating, ratingsCount, reviewsCount, isbn, isbn13, asin, pages, publisher, publishedAt, language, format, genres[], description, imageUrl, series, awards[], ratingsDistribution |
review | reviewerName, reviewerUrl, rating (1–5), reviewText, reviewDate, likesCount, commentsCount, shelves[], plus title + author of the book |
author | title (author name), description (bio), averageRating, ratingsCount, imageUrl |
series | per-book entries with position, title, author, averageRating, ratingsCount |
list_entry | per-book entries from a Goodreads list (e.g. "Best Books of 2024"), with position |
search_result | per-book search match with position, title, author, averageRating |
Pricing
$0.00005per actor start (charged once per run, per GB of RAM).$0.005per result pushed to the dataset.
Typical agent run (one book + 10 reviews): $0.055. A list of 50 entries: $0.25. No setup fee, no monthly minimum.
Input
| Field | Type | Required | Description |
|---|---|---|---|
targets | string[] | ✅ | Goodreads URLs OR free-text queries. Mix freely. |
resultsPerTarget | integer | Max records per target (default 50, max 500). For books this caps reviews. | |
scrapeReviews | boolean | When true (default), book targets also pull reviews up to resultsPerTarget. | |
reviewsLanguage | enum | all, en, es, fr, de, it, pt, nl, ru, ja, zh, ko. Default all. | |
minRating | integer | Skip reviews below this 1-5 rating. Default 1. | |
maxRating | integer | Skip reviews above this 1-5 rating. Default 5. | |
responseFormat | enum | detailed (default, every field) or concise (token-efficient for AI agents). | |
proxyConfiguration | object | Defaults to Apify residential — recommended. |
Example: book + reviews
{"targets": ["https://www.goodreads.com/book/show/40097951-the-silent-patient"],"resultsPerTarget": 30,"scrapeReviews": true,"minRating": 1,"maxRating": 2}
Returns one book record + up to 30 review records filtered to 1–2 star (negative reviews only).
Example: text query → top matches
{"targets": ["sapiens yuval noah harari"],"resultsPerTarget": 10}
Returns 10 search_result records ranked by Goodreads.
Example: author + their best-rated books
{"targets": ["https://www.goodreads.com/author/show/3389.Brandon_Sanderson","https://www.goodreads.com/list/show/1043.Best_Epic_Fantasy"],"resultsPerTarget": 25}
Returns one author record + up to 25 list_entry records.
Calling from code
Node.js (Apify SDK)
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: process.env.APIFY_TOKEN });const run = await client.actor('khadinakbar/goodreads-all-in-one-scraper').call({targets: ['https://www.goodreads.com/book/show/40097951'],resultsPerTarget: 50,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Python
from apify_client import ApifyClientclient = ApifyClient(token="<APIFY_TOKEN>")run = client.actor("khadinakbar/goodreads-all-in-one-scraper").call(run_input={"targets": ["sapiens yuval noah harari"],"resultsPerTarget": 10,})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
Claude / MCP client
The actor is exposed as the MCP tool apify--goodreads-all-in-one-scraper. Sample call:
Use the Goodreads All-in-One Scraper to fetch the top 5 negative reviews(rating <= 2) of "The Silent Patient" so I can identify common complaints.
Claude will compose the input automatically.
Output shape (concise mode)
{"itemType": "book","goodreadsId": "40097951","url": "https://www.goodreads.com/book/show/40097951","title": "The Silent Patient","author": "Alex Michaelides","averageRating": 4.16,"ratingsCount": 3430997,"reviewsCount": 295287,"isbn13": "9781250301697","pages": 336,"publishedAt": "2019-02-05","genres": ["Thriller", "Mystery", "Fiction"],"scrapedAt": "2026-05-13T21:09:00.000Z"}
Use responseFormat: "detailed" for every parsed field (publisher, language, format, description, image URL, awards, ratings distribution, series link, etc.).
How it works
- Each target is classified — book / author / list / series / search URL / text query — by URL pattern.
- A
CheerioCrawlerfetches each page through Apify residential proxies with a session pool that retires sessions on 403/429. - Book pages embed a Next.js Apollo JSON blob; the actor parses that for structured fields and falls back to CSS selectors on legacy templates.
- Reviews are extracted from the same Apollo blob (first ~30) and paginated via
?page=Nfor more. - Records are flat-shaped, charged per result, and pushed to the dataset.
Reliability and anti-bot
- Apify residential proxies are enabled by default. Goodreads occasionally tightens anti-bot — residential keeps success rate above 95%.
- Session pool: 25 sessions, each retired after 30 uses or 3 errors.
- Five retries per request with exponential backoff.
- Failed requests are logged but never silently dropped — partial results are pushed and a
SUMMARYis written to the run KV store.
FAQ
Does the actor need a Goodreads login? No. Everything it reads is publicly accessible. Private shelves and friend feeds require login and are explicitly out of scope.
Why didn't I get all the reviews I asked for? Goodreads only renders ~30 reviews per page; many books have hundreds of thousands. Set resultsPerTarget to the max you actually need — runs scale linearly in pages and proxy cost.
Can I filter reviews by date? Not directly; filter the dataset downstream by reviewDate. Goodreads' sort order is "default" (relevance), which is non-chronological by design.
What about Goodreads' official API? Deprecated in 2020. Scraping is the only practical path for current Goodreads data.
How fresh is the data? Live — each run fetches Goodreads in real time. No cached layer.
Can I export to CSV / Excel? Yes. Open the run on Apify Console → Storage → Dataset → Export → CSV / XLSX / JSON.
Does it work for non-English Goodreads regions? Yes. Goodreads serves the same canonical URL globally. Use the reviewsLanguage filter to slice review content.
What's the difference between this and the other Goodreads actors on the Store? This is the only one that accepts every entity type in a single actor with auto-detection, ships MCP-ready descriptions for AI agents, and runs on residential proxies by default for 95%+ reliability.
Changelog
- 1.0 (2026-05-13) — Initial release. Books, reviews, authors, lists, series, search; auto-detect targets; concise / detailed response modes; MCP-first descriptions; PPE pricing.
Legal
This actor scrapes only publicly accessible Goodreads pages — content that any unauthenticated browser can view. You are responsible for ensuring your use of the data complies with Goodreads' Terms of Service, applicable copyright law (review text is copyrighted by the original reviewer; only fair-use excerpts and quantitative analysis are recommended), GDPR / CCPA when handling reviewer identifiers, and your contractual obligations to downstream consumers. Goodreads is a trademark of Goodreads LLC (Amazon). This actor is not affiliated with, endorsed by, or sponsored by Goodreads LLC. Use responsibly.