Goodreads Explorer: Books, Authors & Reviews Scraper
Pricing
from $1.00 / 1,000 results
Goodreads Explorer: Books, Authors & Reviews Scraper
Scrape public Goodreads data from URLs or simple text targets. Collect books, authors, series, search results, and book reviews with clean structured output. Built for Apify with HTTP-first speed, browser fallback for reliability, proxy support, depth controls, and run-ready dataset/KV summaries.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer
Inus Grobler
Actor stats
0
Bookmarked
3
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
Goodreads Books, Authors & Reviews Scraper
Goodreads Books, Authors & Reviews Scraper is an Apify Actor for scraping public Goodreads data without using the Goodreads API. Use it to scrape Goodreads book pages, author profiles, series pages, search results, ratings, genres, ISBN details, shelves, and public book reviews from either Goodreads URLs or simple text targets such as book titles and author names.
This Actor is designed to be easy to use for non-technical users and practical for production users. In the Apify input UI, you only need to provide:
targetsdepthsearchModemaxReviewsPerBookwhen you want to cap reviews per opened bookproxyConfiguration
If you want a Goodreads scraper for book metadata, author data, Goodreads search results, or Goodreads review scraping from public book pages, this Actor is built for that use case.
Why Use This Goodreads Scraper
- Scrape public Goodreads books, authors, series, search results, and reviews
- Start from a Goodreads URL or a plain-text search like
DuneorStephen King - Use simple depth-based controls instead of a long list of confusing toggles
- Get clean structured JSON in the default dataset
- Get run summaries and failure reports in the default key-value store
- Use HTTP-first crawling for speed, with browser fallback for reliability
- Run it on Apify with proxy support and production-ready deployment files
What This Actor Can Scrape
Goodreads books
- Title, subtitle, full title
- Book cover image
- Description
- Average rating
- Ratings count
- Reviews count
- Genres
- Shelves URL
- First published date
- Edition info
- ISBN, ISBN13, ASIN
- Page count
- Language
- Format
- Series info
- Linked authors
- Public review records from the book page
Goodreads authors
- Author name
- Goodreads author ID
- Photo
- Biography
- Average rating
- Ratings count
- Reviews count
- Website links
- Genres
- Bibliography summary
Goodreads series
- Series title
- Series description
- Books in the series
- Linked book URLs
- Authors when visible
Goodreads search results
- Book search results
- Author search results
- Source query and result position
Goodreads reviews
- Reviewer name
- Reviewer profile URL
- Review ID when available
- Review date when available
- Star rating when available
- Review text
- Likes count when available
- Comments count when available
- Spoiler marker when available
Important note:
- This Actor scrapes public reviews visible on public Goodreads book pages.
- Direct
review/show/...pages are often login-gated on Goodreads and are not the primary review collection surface here.
Easiest Way To Use It
In the Apify input UI, add one or more items to targets.
Each target can be:
- A Goodreads book URL
- A Goodreads author URL
- A Goodreads series URL
- A Goodreads search URL
- A book title
- An author name
- A mixed query like
The Hobbit J.R.R. Tolkien
Then choose depth:
shallow- fastest, root entities only, light search and light detailstandard- best default, follows the most useful related data including public reviews from book pagesdeep- richest output, broader follow-up, slower runs
Then choose searchMode:
books- return matched books and follow linked entities by depthreviews- find top matched books for text queries, then crawl reviews across review pages for each matched book
Then optionally set maxReviewsPerBook:
- leave it empty to scrape all visible review pages for each opened book
- set it to a number only if you want to cap reviews per book
Then keep Apify Proxy enabled for production and run the Actor.
Best Input Examples
Scrape a Goodreads book URL with reviews
Use standard or deep if you want review items.
{"targets": ["https://www.goodreads.com/book/show/11588.The_Shining"],"depth": "standard","proxyConfiguration": {"useApifyProxy": true}}
Scrape a Goodreads author URL
{"targets": ["https://www.goodreads.com/author/show/3389.Stephen_King"],"depth": "standard","proxyConfiguration": {"useApifyProxy": true}}
Search Goodreads by book title
{"targets": ["Dune"],"depth": "standard","proxyConfiguration": {"useApifyProxy": true}}
Search by title and collect reviews from top matched books
{"targets": ["Harry Potter"],"searchMode": "reviews","depth": "standard","proxyConfiguration": {"useApifyProxy": true}}
Cap reviews per book when needed
{"targets": ["Harry Potter"],"searchMode": "reviews","depth": "standard","maxReviewsPerBook": 200,"proxyConfiguration": {"useApifyProxy": true}}
Search Goodreads by author name
{"targets": ["Stephen King"],"depth": "standard","proxyConfiguration": {"useApifyProxy": true}}
Run multiple Goodreads targets at once
{"targets": ["https://www.goodreads.com/book/show/44767458-dune","Frank Herbert","The Hobbit J.R.R. Tolkien"],"depth": "deep","proxyConfiguration": {"useApifyProxy": true}}
How Search Works
If a target is not a Goodreads URL, the Actor treats it as a Goodreads search query.
For plain-text targets, the Actor:
- Builds Goodreads search requests internally
- Uses
searchMode: "books"to scrape book search results (authors can still be enabled via advancedsearchEntityTypes), orsearchMode: "reviews"to scrape book results only - Saves search result records to the dataset
- Follows matched pages:
- in
booksmode, follows books/authors according to depth - in
reviewsmode, follows top matched books and paginates through review pages
- in
Depth controls how many book results the Actor follows and how much related data it expands.
Once a book page is opened and reviews are enabled, the Actor now scrapes all visible review pages by default.
Use maxReviewsPerBook only when you want to cap reviews for both direct book scraping and review-search runs.
This means you can start from simple inputs like:
Atomic HabitsJ.R.R. TolkienThe Hobbit J.R.R. Tolkien
and still get structured Goodreads data back.
Output Format
The default output mode is optimized for easy downstream use in Apify, Make, n8n, Python, JavaScript, spreadsheets, databases, and LLM pipelines.
Every dataset item includes core fields where available:
recordTypesourceUrlcanonicalUrlscrapedAtdetailLevelgoodreadsId
Depending on the entity, items may also include:
sourceContextavailabilityFlagspaginationInfolinkedEntitiesbreadcrumbs
Main record types
bookauthorseriesreviewsearch_result
Example review item
{"recordType": "review","canonicalUrl": "https://www.goodreads.com/review/show/78615227","bookUrl": "https://www.goodreads.com/book/show/830502.It","reviewerName": "Maciek","starRating": 5,"reviewText": "My short review text...","sourceContext": {"parentUrl": "https://www.goodreads.com/book/show/830502.It","discoveredFrom": "search_result","query": "Stephen King"}}
Example book item
{"recordType": "book","canonicalUrl": "https://www.goodreads.com/book/show/11588.The_Shining","goodreadsId": 11588,"title": "The Shining","averageRating": 4.28,"ratingsCount": 1727467,"reviewsCount": 51808,"genres": [{ "name": "Horror", "url": "https://www.goodreads.com/genres/horror" }],"authors": [{"id": 3389,"name": "Stephen King","profileUrl": "https://www.goodreads.com/author/show/3389.Stephen_King"}]}
What Gets Saved In Apify
Output tab
The Actor now defines an Apify output schema, dataset schema, and key-value store schema so the run output is easier to browse in the Apify Console.
In a finished run, you will see quick links for:
Results Overview- a clean table view for books, authors, series, reviews, and search resultsDetailed Results- a wider table with longer text fields such as descriptions and review textRun Summary- theOUTPUTkey-value store recordFailed Requests- theFAILED_REQUESTSkey-value store recordDebug Log- theDEBUG_LOGkey-value store record when debug mode is enabledStorage Files- the default key-value store browser for saved HTML, screenshots, and JSON records
Dataset
Structured result items are written to the default dataset.
Key-value store
The Actor also writes:
OUTPUT- crawl summaryFAILED_REQUESTS- failed URL recordsDEBUG_LOG- optional structured debug output when debug mode is enabled
The OUTPUT record includes useful run totals such as:
- requests handled
- items written
- books scraped
- authors scraped
- series scraped
- reviews scraped
- search results scraped
- failed URLs
- blocked requests
Why The Output Is Cleaner
The Actor intentionally removes noisy fields where possible before writing items.
That includes:
- dropping
nulland empty values from output - compressing redundant
sourceContextfields - normalizing messy search-result author names
The result is easier-to-read JSON and a cleaner dataset for downstream automation.
Goodreads Reviews FAQ
Can I scrape reviews from a Goodreads book URL?
Yes. Pass the Goodreads book URL in targets and use standard or deep.
Will I get review dataset items?
Yes. Review records are written as separate dataset items with recordType: "review" when public review content is available on the book page.
Does the Actor open direct Goodreads review pages?
Not as the primary review strategy. Goodreads often login-gates direct review/show/... pages for signed-out access. The Actor focuses on public review data visible from public book pages.
How do I get more reviews?
Use deep and increase advanced limits only if you need them. For most users, standard is the best balance between output richness and speed.
Advanced Input Support
The published Apify input UI is intentionally simple, but the Actor still supports advanced raw JSON input for power users.
Advanced fields include:
searchModemaxReviewsPerBookstartUrlssearchQueriessearchEntityTypesentityTypesexpanddetailLeveloutputModelimitscrawlModerequestDelayMinMsrequestDelayMaxMssaveHtmlsaveScreenshotsincludeRawBlocksdebug
If you do not need fine-grained control, use the simple targets + depth + searchMode + proxyConfiguration mode.
Limitations
- This Actor scrapes public Goodreads pages only
- It does not use the Goodreads API
- It does not log in
- It does not access private Goodreads user data
- Goodreads layout changes can affect selectors over time
- Search quality depends on Goodreads' own ranking and matching
- Direct
review/show/...pages are often restricted for signed-out scraping
Troubleshooting
If a run returns fewer results than expected:
- Try a direct Goodreads URL in
targetsto confirm the Actor can reach the exact page you want - Keep Apify Proxy enabled in production
- Switch from
shallowtostandardif you need reviews - Switch from
standardtodeepif you need broader follow-up - Check the
OUTPUTandFAILED_REQUESTSrecords in the default key-value store
If Goodreads changes its layout:
- Enable
debug - Optionally enable
saveHtml - Optionally enable
saveScreenshots - Review
DEBUG_LOGand the saved artifacts
SEO And Discovery Notes
This README intentionally includes the terms users actually search for on Apify and search engines:
- Goodreads scraper
- Goodreads reviews scraper
- Goodreads book scraper
- Goodreads author scraper
- Goodreads data scraper
- Goodreads API alternative
- scrape Goodreads reviews
- scrape Goodreads books and authors
These keywords are used naturally in the title, introduction, headings, examples, and FAQ so the Actor is easier to discover without turning the page into keyword spam.
Local Development
Install dependencies:
$npm install
Build:
$npm run build
Run locally:
$npm run dev
Type-check:
$npm run check
Main project files:
.actor/actor.jsoninput_schema.jsonsrc/extractors/output/search/utils/config.tsmain.tsrouter.tsstate.tsDockerfileapify.jsonpackage.jsonREADME.md
Apify Deployment
This repository is ready for Apify deployment.
Included files:
package.jsonDockerfile.actor/actor.json.actor/input_schema.jsonapify.json
Deploy with the Apify CLI:
$apify push
Or import the Git repository into Apify Console and build there.
Good Defaults
For most users:
- Use Goodreads URLs whenever you already know the exact page you want
- Use
standardas the default depth - Use
deeponly when you want the richest output and broader follow-up - Keep Apify Proxy enabled in production
If you want a no-code Goodreads scraper for books, authors, search results, and public book reviews, this Actor is the simplest way to get structured Goodreads data on Apify.