Goodreads Explorer: Books, Authors & Reviews Scraper avatar

Goodreads Explorer: Books, Authors & Reviews Scraper

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Goodreads Explorer: Books, Authors & Reviews Scraper

Goodreads Explorer: Books, Authors & Reviews Scraper

Scrape public Goodreads data from URLs or simple text targets. Collect books, authors, series, search results, and book reviews with clean structured output. Built for Apify with HTTP-first speed, browser fallback for reliability, proxy support, depth controls, and run-ready dataset/KV summaries.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Inus Grobler

Inus Grobler

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

5 days ago

Last modified

Share

Goodreads Books, Authors & Reviews Scraper

Goodreads Books, Authors & Reviews Scraper is an Apify Actor for scraping public Goodreads data without using the Goodreads API. Use it to scrape Goodreads book pages, author profiles, series pages, search results, ratings, genres, ISBN details, shelves, and public book reviews from either Goodreads URLs or simple text targets such as book titles and author names.

This Actor is designed to be easy to use for non-technical users and practical for production users. In the Apify input UI, you only need to provide:

  • targets
  • depth
  • searchMode
  • maxReviewsPerBook when you want to cap reviews per opened book
  • proxyConfiguration

If you want a Goodreads scraper for book metadata, author data, Goodreads search results, or Goodreads review scraping from public book pages, this Actor is built for that use case.

Why Use This Goodreads Scraper

  • Scrape public Goodreads books, authors, series, search results, and reviews
  • Start from a Goodreads URL or a plain-text search like Dune or Stephen King
  • Use simple depth-based controls instead of a long list of confusing toggles
  • Get clean structured JSON in the default dataset
  • Get run summaries and failure reports in the default key-value store
  • Use HTTP-first crawling for speed, with browser fallback for reliability
  • Run it on Apify with proxy support and production-ready deployment files

What This Actor Can Scrape

Goodreads books

  • Title, subtitle, full title
  • Book cover image
  • Description
  • Average rating
  • Ratings count
  • Reviews count
  • Genres
  • Shelves URL
  • First published date
  • Edition info
  • ISBN, ISBN13, ASIN
  • Page count
  • Language
  • Format
  • Series info
  • Linked authors
  • Public review records from the book page

Goodreads authors

  • Author name
  • Goodreads author ID
  • Photo
  • Biography
  • Average rating
  • Ratings count
  • Reviews count
  • Website links
  • Genres
  • Bibliography summary

Goodreads series

  • Series title
  • Series description
  • Books in the series
  • Linked book URLs
  • Authors when visible

Goodreads search results

  • Book search results
  • Author search results
  • Source query and result position

Goodreads reviews

  • Reviewer name
  • Reviewer profile URL
  • Review ID when available
  • Review date when available
  • Star rating when available
  • Review text
  • Likes count when available
  • Comments count when available
  • Spoiler marker when available

Important note:

  • This Actor scrapes public reviews visible on public Goodreads book pages.
  • Direct review/show/... pages are often login-gated on Goodreads and are not the primary review collection surface here.

Easiest Way To Use It

In the Apify input UI, add one or more items to targets.

Each target can be:

  • A Goodreads book URL
  • A Goodreads author URL
  • A Goodreads series URL
  • A Goodreads search URL
  • A book title
  • An author name
  • A mixed query like The Hobbit J.R.R. Tolkien

Then choose depth:

  • shallow - fastest, root entities only, light search and light detail
  • standard - best default, follows the most useful related data including public reviews from book pages
  • deep - richest output, broader follow-up, slower runs

Then choose searchMode:

  • books - return matched books and follow linked entities by depth
  • reviews - find top matched books for text queries, then crawl reviews across review pages for each matched book

Then optionally set maxReviewsPerBook:

  • leave it empty to scrape all visible review pages for each opened book
  • set it to a number only if you want to cap reviews per book

Then keep Apify Proxy enabled for production and run the Actor.

Best Input Examples

Scrape a Goodreads book URL with reviews

Use standard or deep if you want review items.

{
"targets": [
"https://www.goodreads.com/book/show/11588.The_Shining"
],
"depth": "standard",
"proxyConfiguration": {
"useApifyProxy": true
}
}

Scrape a Goodreads author URL

{
"targets": [
"https://www.goodreads.com/author/show/3389.Stephen_King"
],
"depth": "standard",
"proxyConfiguration": {
"useApifyProxy": true
}
}

Search Goodreads by book title

{
"targets": [
"Dune"
],
"depth": "standard",
"proxyConfiguration": {
"useApifyProxy": true
}
}

Search by title and collect reviews from top matched books

{
"targets": [
"Harry Potter"
],
"searchMode": "reviews",
"depth": "standard",
"proxyConfiguration": {
"useApifyProxy": true
}
}

Cap reviews per book when needed

{
"targets": [
"Harry Potter"
],
"searchMode": "reviews",
"depth": "standard",
"maxReviewsPerBook": 200,
"proxyConfiguration": {
"useApifyProxy": true
}
}

Search Goodreads by author name

{
"targets": [
"Stephen King"
],
"depth": "standard",
"proxyConfiguration": {
"useApifyProxy": true
}
}

Run multiple Goodreads targets at once

{
"targets": [
"https://www.goodreads.com/book/show/44767458-dune",
"Frank Herbert",
"The Hobbit J.R.R. Tolkien"
],
"depth": "deep",
"proxyConfiguration": {
"useApifyProxy": true
}
}

How Search Works

If a target is not a Goodreads URL, the Actor treats it as a Goodreads search query.

For plain-text targets, the Actor:

  1. Builds Goodreads search requests internally
  2. Uses searchMode: "books" to scrape book search results (authors can still be enabled via advanced searchEntityTypes), or searchMode: "reviews" to scrape book results only
  3. Saves search result records to the dataset
  4. Follows matched pages:
    • in books mode, follows books/authors according to depth
    • in reviews mode, follows top matched books and paginates through review pages

Depth controls how many book results the Actor follows and how much related data it expands.

Once a book page is opened and reviews are enabled, the Actor now scrapes all visible review pages by default. Use maxReviewsPerBook only when you want to cap reviews for both direct book scraping and review-search runs.

This means you can start from simple inputs like:

  • Atomic Habits
  • J.R.R. Tolkien
  • The Hobbit J.R.R. Tolkien

and still get structured Goodreads data back.

Output Format

The default output mode is optimized for easy downstream use in Apify, Make, n8n, Python, JavaScript, spreadsheets, databases, and LLM pipelines.

Every dataset item includes core fields where available:

  • recordType
  • sourceUrl
  • canonicalUrl
  • scrapedAt
  • detailLevel
  • goodreadsId

Depending on the entity, items may also include:

  • sourceContext
  • availabilityFlags
  • paginationInfo
  • linkedEntities
  • breadcrumbs

Main record types

  • book
  • author
  • series
  • review
  • search_result

Example review item

{
"recordType": "review",
"canonicalUrl": "https://www.goodreads.com/review/show/78615227",
"bookUrl": "https://www.goodreads.com/book/show/830502.It",
"reviewerName": "Maciek",
"starRating": 5,
"reviewText": "My short review text...",
"sourceContext": {
"parentUrl": "https://www.goodreads.com/book/show/830502.It",
"discoveredFrom": "search_result",
"query": "Stephen King"
}
}

Example book item

{
"recordType": "book",
"canonicalUrl": "https://www.goodreads.com/book/show/11588.The_Shining",
"goodreadsId": 11588,
"title": "The Shining",
"averageRating": 4.28,
"ratingsCount": 1727467,
"reviewsCount": 51808,
"genres": [
{ "name": "Horror", "url": "https://www.goodreads.com/genres/horror" }
],
"authors": [
{
"id": 3389,
"name": "Stephen King",
"profileUrl": "https://www.goodreads.com/author/show/3389.Stephen_King"
}
]
}

What Gets Saved In Apify

Output tab

The Actor now defines an Apify output schema, dataset schema, and key-value store schema so the run output is easier to browse in the Apify Console.

In a finished run, you will see quick links for:

  • Results Overview - a clean table view for books, authors, series, reviews, and search results
  • Detailed Results - a wider table with longer text fields such as descriptions and review text
  • Run Summary - the OUTPUT key-value store record
  • Failed Requests - the FAILED_REQUESTS key-value store record
  • Debug Log - the DEBUG_LOG key-value store record when debug mode is enabled
  • Storage Files - the default key-value store browser for saved HTML, screenshots, and JSON records

Dataset

Structured result items are written to the default dataset.

Key-value store

The Actor also writes:

  • OUTPUT - crawl summary
  • FAILED_REQUESTS - failed URL records
  • DEBUG_LOG - optional structured debug output when debug mode is enabled

The OUTPUT record includes useful run totals such as:

  • requests handled
  • items written
  • books scraped
  • authors scraped
  • series scraped
  • reviews scraped
  • search results scraped
  • failed URLs
  • blocked requests

Why The Output Is Cleaner

The Actor intentionally removes noisy fields where possible before writing items.

That includes:

  • dropping null and empty values from output
  • compressing redundant sourceContext fields
  • normalizing messy search-result author names

The result is easier-to-read JSON and a cleaner dataset for downstream automation.

Goodreads Reviews FAQ

Can I scrape reviews from a Goodreads book URL?

Yes. Pass the Goodreads book URL in targets and use standard or deep.

Will I get review dataset items?

Yes. Review records are written as separate dataset items with recordType: "review" when public review content is available on the book page.

Does the Actor open direct Goodreads review pages?

Not as the primary review strategy. Goodreads often login-gates direct review/show/... pages for signed-out access. The Actor focuses on public review data visible from public book pages.

How do I get more reviews?

Use deep and increase advanced limits only if you need them. For most users, standard is the best balance between output richness and speed.

Advanced Input Support

The published Apify input UI is intentionally simple, but the Actor still supports advanced raw JSON input for power users.

Advanced fields include:

  • searchMode
  • maxReviewsPerBook
  • startUrls
  • searchQueries
  • searchEntityTypes
  • entityTypes
  • expand
  • detailLevel
  • outputMode
  • limits
  • crawlMode
  • requestDelayMinMs
  • requestDelayMaxMs
  • saveHtml
  • saveScreenshots
  • includeRawBlocks
  • debug

If you do not need fine-grained control, use the simple targets + depth + searchMode + proxyConfiguration mode.

Limitations

  • This Actor scrapes public Goodreads pages only
  • It does not use the Goodreads API
  • It does not log in
  • It does not access private Goodreads user data
  • Goodreads layout changes can affect selectors over time
  • Search quality depends on Goodreads' own ranking and matching
  • Direct review/show/... pages are often restricted for signed-out scraping

Troubleshooting

If a run returns fewer results than expected:

  1. Try a direct Goodreads URL in targets to confirm the Actor can reach the exact page you want
  2. Keep Apify Proxy enabled in production
  3. Switch from shallow to standard if you need reviews
  4. Switch from standard to deep if you need broader follow-up
  5. Check the OUTPUT and FAILED_REQUESTS records in the default key-value store

If Goodreads changes its layout:

  1. Enable debug
  2. Optionally enable saveHtml
  3. Optionally enable saveScreenshots
  4. Review DEBUG_LOG and the saved artifacts

SEO And Discovery Notes

This README intentionally includes the terms users actually search for on Apify and search engines:

  • Goodreads scraper
  • Goodreads reviews scraper
  • Goodreads book scraper
  • Goodreads author scraper
  • Goodreads data scraper
  • Goodreads API alternative
  • scrape Goodreads reviews
  • scrape Goodreads books and authors

These keywords are used naturally in the title, introduction, headings, examples, and FAQ so the Actor is easier to discover without turning the page into keyword spam.

Local Development

Install dependencies:

$npm install

Build:

$npm run build

Run locally:

$npm run dev

Type-check:

$npm run check

Main project files:

.actor/
actor.json
input_schema.json
src/
extractors/
output/
search/
utils/
config.ts
main.ts
router.ts
state.ts
Dockerfile
apify.json
package.json
README.md

Apify Deployment

This repository is ready for Apify deployment.

Included files:

  • package.json
  • Dockerfile
  • .actor/actor.json
  • .actor/input_schema.json
  • apify.json

Deploy with the Apify CLI:

$apify push

Or import the Git repository into Apify Console and build there.

Good Defaults

For most users:

  • Use Goodreads URLs whenever you already know the exact page you want
  • Use standard as the default depth
  • Use deep only when you want the richest output and broader follow-up
  • Keep Apify Proxy enabled in production

If you want a no-code Goodreads scraper for books, authors, search results, and public book reviews, this Actor is the simplest way to get structured Goodreads data on Apify.