Goodreads Review Scraper avatar

Goodreads Review Scraper

Pricing

from $4.99 / 1,000 results

Go to Apify Store
Goodreads Review Scraper

Goodreads Review Scraper

📚 Goodreads Review Scraper pulls reviews from book & author pages — ratings, review text, dates, shelves, likes & reviewer info. ⚡ Export CSV/JSON/API for sentiment, market research & book marketing. 🚀 Perfect for publishers, authors & data teams.

Pricing

from $4.99 / 1,000 results

Rating

0.0

(0)

Developer

Scraper Engine

Scraper Engine

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

Goodreads Review Scraper

Goodreads Review Scraper is a production-ready tool that collects public book reviews from Goodreads and saves them to your Apify dataset in real time. It solves the tedious problem of manually copy-pasting ratings and review text by automating discovery of Goodreads’ GraphQL endpoint and paging through results. Built for marketers, developers, data analysts, and researchers, this Goodreads reviews scraper lets you scrape Goodreads reviews at scale and turn them into a clean Goodreads review dataset you can analyze or export via CSV/JSON/API.

What data / output can you get?

Below are the structured fields this Goodreads review extractor saves for each review. Values are flattened from Goodreads’ GraphQL “getReviews” response and streamed to the Dataset as individual rows.

Data fieldDescriptionExample value
bookUrlThe Goodreads book URL this review belongs tohttps://www.goodreads.com/book/show/26032825
idReview identifier"1234567890"
ratingStar rating given by the reviewer4
textFull review text"A timeless story with beautiful prose and complex characters..."
createdAtReview creation timestamp (ms)1704067200000
updatedAtLast update timestamp (ms)1704153600000
lastRevisionAtLast revision timestamp (ms)1704157200000
spoilerStatusSpoiler status flagnull
likeCountNumber of likes12
commentCountNumber of comments3
viewerHasLikedWhether the current viewer liked itfalse
shelvingReviewer’s shelf and tags metadata{"shelf": {"name": "read", "webUrl": "https://www.goodreads.com/review/list/…", "__typename": "Shelf"}, "taggings": [{"tag": {"name": "classic", "webUrl": "https://www.goodreads.com/tag/classic", "__typename": "Tag"}, "__typename": "Tagging"}], "webUrl": "https://www.goodreads.com/review/show/…", "__typename": "Shelving"}
creatorReviewer profile metadata{"id": "987654321", "name": "Jane Doe", "webUrl": "https://www.goodreads.com/user/show/…", "followersCount": 250, "imageUrlSquare": "https://i.gr-assets.com/…jpg", "isAuthor": false, "textReviewsCount": 134, "viewerRelationshipStatus": {"isBlockedByViewer": false, "__typename": "ViewerRelationshipStatus"}, "contributor": {"id": "…", "works": {"totalCount": 0, "__typename": "WorksConnection"}, "__typename": "Contributor"}, "__typename": "User"}
__typenameGraphQL typename for the review node"Review"

Notes:

  • Timestamps are integers in milliseconds.
  • Some fields can be null depending on the review (e.g., spoilerStatus, shelving, certain creator subfields).
  • You can download Goodreads reviews via the Apify Dataset in CSV or JSON, or access them through the Apify API for pipelines and automation.

Key features

  • ⚡ Robust connection handling & retries — Starts direct and automatically escalates to Apify datacenter, then RESIDENTIAL proxy (locked for the run) on failures, with up to 3× retries per step and backoff. Ideal for a reliable Goodreads review crawler at scale.
  • 🧾 Batch input (URLs or IDs) — Paste full Goodreads book links or just the numeric ID (e.g., 26032825). The actor normalizes to book pages and processes multiple items in one run.
  • 🎛️ Sorting, language, edition filters — Control results with sortBy (popular/newest/oldest), optional languageCode, and reviewEdition (ALL vs only_this_book), so you can extract Goodreads review text tailored to your needs.
  • 📦 Streaming dataset output — Results are pushed to the Dataset as they’re found, creating a ready-to-use Goodreads review data export for analytics or dashboards.
  • 🔑 GraphQL-powered extraction — Discovers the Goodreads GraphQL endpoint and apiKey from bundled JS and fetches reviews via getReviews with pagination for accuracy and consistency.
  • 🐍 Developer-friendly (Python) — Built in Python and deployable on Apify for easy integration. Use the Apify API to consume your Goodreads review dataset programmatically.
  • 🔁 API alternative to extensions — A dependable Goodreads reviews API alternative without fragile browser automation or a Goodreads review scraper Chrome extension.

How to use Goodreads Review Scraper - step by step

  1. Sign in to Apify and open the “Goodreads Review Scraper” actor.
  2. Add input data under “📚 Book URLs”:
    • Paste one or more Goodreads book URLs, or just the numeric IDs from those URLs (e.g., 26032825).
  3. Set “🔢 Max reviews per book” to control how many reviews to collect for each URL.
  4. Expand “⚙️ Filters & sorting” (optional) to fine-tune:
    • sortBy: popular (default), newest, or oldest
    • languageCode: pick one language or use “all”
    • reviewEdition: ALL (work-wide) or only_this_book (edition-specific)
  5. (Optional) Configure “🔒 Proxy” if you want to force Apify Proxy usage. The run will automatically try smarter routing if connectivity issues appear.
  6. Click Start. The actor loads each book’s reviews page, discovers the GraphQL endpoint, and streams reviews into your Dataset as it paginates.
  7. Watch “📋 Live results” in the Dataset. Rows appear in real time with rating, text, timestamps, likes/comments, shelving, and reviewer info.
  8. Export your Goodreads review data to CSV or JSON from the Dataset, or fetch it via the Apify API for further processing.

Pro tip: Pipe the Dataset to your analytics stack via the Apify API to build automated sentiment analysis, trend tracking, or book marketing dashboards.

Use cases

Use caseDescription
Publisher market researchAnalyze ratings, review volume, and reader sentiment across titles to guide positioning, blurbs, or A/B testing of covers.
Author & book marketingTrack newest vs most popular reviews to inform messaging, outreach, and launch timing. Export data for campaigns.
Data analysis & sentimentBuild a Goodreads review dataset to run NLP, sentiment scoring, and topic modeling across thousands of reviews.
Competitive benchmarkingCompare likeCount, commentCount, and ratings across comparable works to identify market gaps and strengths.
Academic researchCollect structured, language-filtered review text for studies on readership, linguistics, or cultural trends.
Developer pipelines (API)Use the Apify API to scrape Goodreads reviews and feed JSON directly into ETL, data lakes, or ML workflows.

Why choose Goodreads Review Scraper?

The tool is engineered for precision, automation, and reliability as a Goodreads review scraping tool.

  • ✅ Accurate, structured output — Mirrors Goodreads’ GraphQL getReviews shape with flattened, clean fields.
  • 🌍 Language-aware filtering — Optional languageCode filter to focus your analysis on the languages that matter.
  • 📈 Scalable batch runs — Paste multiple URLs or IDs and download Goodreads reviews at volume without manual effort.
  • 🧰 Developer access — Build integrations around the Apify API and consume the Goodreads review extractor output in your pipelines.
  • 🛡️ Resilient by design — Automatic connection escalation (direct → datacenter → RESIDENTIAL) with retries reduces job failures.
  • 💸 Cost-effective automation — Skip brittle browser add-ons and unstable scripts; run a consistent, repeatable Goodreads review crawler.

In short: a stable Goodreads reviews API alternative that turns public reviews into analysis-ready data.

Yes—when used responsibly. This actor collects only publicly visible reviews and does not use authentication. You are responsible for:

  • Scraping only public content and respecting Goodreads’ terms.
  • Complying with data protection laws such as GDPR/CCPA where applicable.
  • Using the data ethically (e.g., analytics, research) and avoiding misuse.
  • Consulting your legal team for edge cases or jurisdiction-specific requirements.

Input parameters & output format

Example JSON input

{
"urls": [
"26032825",
"https://www.goodreads.com/book/show/4671.The_Great_Gatsby"
],
"maxItems": 50,
"filtersAndOptions": {
"sortBy": "newest",
"languageCode": "en",
"reviewEdition": "only_this_book"
},
"proxyConfiguration": {
"useApifyProxy": false
}
}

Parameter reference

  • urls (array, required): Paste one or more Goodreads book links. Tip: you can paste just the number from the URL (e.g., 26032825) — the actor will fix the link for you. Default: none.
  • maxItems (integer, optional): How many reviews to collect for each book. Minimum 1, maximum 10000. Default: 20.
  • filtersAndOptions (object, optional): Optional filters and sorting.
    • sortBy (string, optional): popular | newest | oldest. Default: popular.
    • languageCode (string, optional): Choose a language or “all” for no filter. Default: all.
    • reviewEdition (string, optional): ALL (work-wide) or only_this_book (edition-specific). Default: ALL.
  • proxyConfiguration (object, optional): Optional Apify Proxy configuration. If something blocks a run, the actor will try smarter routing automatically.

Example JSON output

{
"bookUrl": "https://www.goodreads.com/book/show/26032825",
"__typename": "Review",
"id": "1234567890",
"creator": {
"id": "987654321",
"imageUrlSquare": "https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/users/…/12345.jpg",
"isAuthor": false,
"viewerRelationshipStatus": {
"isBlockedByViewer": false,
"__typename": "ViewerRelationshipStatus"
},
"followersCount": 250,
"__typename": "User",
"textReviewsCount": 134,
"name": "Jane Doe",
"webUrl": "https://www.goodreads.com/user/show/987654321-jane-doe",
"contributor": {
"id": "contrib_001",
"works": {
"totalCount": 0,
"__typename": "WorksConnection"
},
"__typename": "Contributor"
}
},
"recommendFor": null,
"updatedAt": 1704153600000,
"createdAt": 1704067200000,
"spoilerStatus": null,
"lastRevisionAt": 1704157200000,
"text": "A timeless story with beautiful prose and complex characters...",
"rating": 4,
"shelving": {
"shelf": {
"name": "read",
"webUrl": "https://www.goodreads.com/review/list/…",
"__typename": "Shelf"
},
"taggings": [
{
"tag": {
"name": "classic",
"webUrl": "https://www.goodreads.com/tag/classic",
"__typename": "Tag"
},
"__typename": "Tagging"
}
],
"webUrl": "https://www.goodreads.com/review/show/…",
"__typename": "Shelving"
},
"likeCount": 12,
"viewerHasLiked": false,
"commentCount": 3
}

Notes:

  • Fields like text, shelving, or certain creator subfields can be null depending on the review.
  • Timestamps are numeric (milliseconds).

FAQ

Do I need to log in or provide cookies to scrape Goodreads reviews?

No. The actor works with public Goodreads pages and does not require login or cookies. It discovers the GraphQL endpoint from the book’s reviews page and fetches public data.

How many reviews can I scrape per book?

You control this via maxItems. The input constraint allows up to 10,000 reviews per book. The actor paginates through results until the limit is reached or the list ends.

Can I sort and filter reviews?

Yes. You can sort by popular, newest, or oldest and optionally filter by languageCode. You can also choose whether to collect reviews for ALL editions (work-wide) or only_this_book (edition-specific).

Can I input just a Goodreads ID instead of a full URL?

Yes. Paste the numeric ID from a Goodreads book URL (e.g., 26032825). The actor will normalize it to a proper Goodreads book page automatically.

What formats can I export?

You can export your Goodreads review data from the Apify Dataset to CSV or JSON, or access it programmatically via the Apify API.

Is this an alternative to a Goodreads reviews API?

Yes. It’s a Goodreads reviews API alternative that discovers Goodreads’ GraphQL endpoint on the fly and fetches public review data without browser extensions.

How does the actor handle blocks or rate limits?

It starts with a direct connection and automatically escalates to Apify datacenter proxy, then RESIDENTIAL proxy (locked for the remainder of the run) if needed. Each request phase has up to 3 retries with backoff.

What review fields are included?

Each row includes bookUrl, id, text, rating, createdAt/updatedAt/lastRevisionAt, spoilerStatus, likeCount, commentCount, viewerHasLiked, shelving, creator, and __typename—flattened for easy analysis.

Final thoughts

Goodreads Review Scraper is built to reliably extract public Goodreads ratings and reviews at scale. With batch inputs, sorting/language filters, and resilient proxy handling, it helps publishers, authors, researchers, and data teams turn unstructured reviews into actionable datasets. Developers can consume results via the Apify API and integrate them into Python-based or ETL workflows. Start collecting a clean Goodreads review dataset and power your analytics, sentiment, and marketing insights today.