Goodreads Books Scraper
Pricing
Pay per usage
Goodreads Books Scraper
Efficiently extract detailed book data with the Goodreads Books Scraper. Ideal for building reading lists or analyzing metadata. Note: For bulk scraping of more than 50 books, providing JSON cookies is essential to ensure seamless access and reliable results.
Pricing
Pay per usage
Rating
5.0
(1)
Developer

Shahid Irfan
Actor stats
0
Bookmarked
7
Total users
3
Monthly active users
11 days ago
Last modified
Categories
Share
Goodreads Book Scraper
Extract rich Goodreads book data at scale from public shelves and related results. Collect complete records including title, author, ratings, review volume, description, ISBN, publisher, publication date, genres, image, and URL. Built for research, analysis, monitoring, and dataset creation.
Features
- Complete book records — Always returns detailed fields for every collected book
- Automatic pagination — Continues collecting across multiple pages to reach your target
- Fast collection flow — Parallel detail fetching for better throughput
- Stable public extraction — Designed to work on public Goodreads data
- Structured output — Clean dataset items ready for BI, ETL, and automation
Use Cases
Market Research
Track popular books, ratings, and genre movement over time. Build snapshots for trend analysis and category planning.
Recommendation Datasets
Create feature-rich book datasets for recommendation systems, ranking models, or catalog enrichment workflows.
Content Strategy
Discover high-interest titles and genres to inform newsletters, blog content, and reading list curation.
Competitive Intelligence
Monitor visibility and engagement signals by shelf category to compare market segments.
Academic Analysis
Use structured book metadata for reading behavior studies, literature analysis, and longitudinal research.
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
startUrls | Array | No | [{"url":"https://www.goodreads.com/shelf/show/fantasy"}] | Optional shelf URL list. First valid shelf path is used as seed. |
shelf | String | No | "fantasy" | Shelf name to scrape when URL list is not provided. |
results_wanted | Integer | No | 50 | Maximum number of books to save. |
max_pages | Integer | No | 10 | Maximum discovery pages to scan. |
proxyConfiguration | Object | No | { "useApifyProxy": true } | Optional proxy settings for better reliability. |
Output Data
Each dataset item contains:
| Field | Type | Description |
|---|---|---|
title | String | Book title |
author | String | Primary author |
rating | Number | Average rating |
ratingCount | Number | Total ratings count |
reviewCount | Number | Total text reviews count |
description | String | Book description |
image | String | Cover image URL |
isbn | String | ISBN (or ISBN-13 when available) |
publisher | String | Publisher name |
publishDate | String | Publication date in YYYY-MM-DD |
genres | Array | Genre list |
url | String | Goodreads book URL |
_source | String | Source marker |
Usage Examples
Basic Shelf Run
{"shelf": "fantasy","results_wanted": 100,"max_pages": 10}
URL-Driven Run
{"startUrls": [{ "url": "https://www.goodreads.com/shelf/show/science-fiction" }],"results_wanted": 150,"max_pages": 20}
High-Volume Run with Proxy
{"shelf": "mystery","results_wanted": 500,"max_pages": 40,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Sample Output
{"title": "The Name of the Wind","author": "Patrick Rothfuss","rating": 4.52,"ratingCount": 985432,"reviewCount": 45678,"description": "Told in Kvothe's own voice, this is the tale of the magically gifted young man...","image": "https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/....jpg","isbn": "9780756404741","publisher": "DAW Books","publishDate": "2007-03-27","genres": ["Fantasy", "Fiction", "Adventure"],"url": "https://www.goodreads.com/book/show/186074.The_Name_of_the_Wind","_source": "goodreads-api"}
Tips for Best Results
Start Small First
- Run with
results_wantedbetween50and100to validate quickly - Increase limits after confirming your target shelf behavior
Tune Page Limits
- Raise
max_pageswhen targeting larger collections - Keep
max_pagesproportional toresults_wanted
Use Proxies for Long Runs
- Use residential proxy settings for improved stability
- Keep retries low and focus on steady throughput
Validate Output Early
- Check first 20 items for field completeness
- Confirm titles, authors, ratings, and genres match your expectations
Integrations
- Google Sheets — Export datasets for quick analysis
- Airtable — Build searchable book intelligence tables
- Looker Studio / Power BI — Visualize rating and genre trends
- Zapier / Make — Trigger downstream automations
- Webhooks — Feed your own APIs and pipelines
Export Formats
- JSON — Best for APIs and programmatic processing
- CSV — Spreadsheet-friendly analysis
- Excel — Business reporting workflows
- XML — Legacy pipeline compatibility
Frequently Asked Questions
Why do I see 50 books from shelf pages?
Goodreads public shelf pages can repeat a first-page set. The actor continues discovery with additional public result pages to reach your requested volume.
Do I need cookies?
No. This actor is configured for public data collection without authentication cookies.
Is detailed output optional?
No. This actor is configured to always return detailed book records.
How many books can I collect?
Use results_wanted and max_pages to control volume. Increase both for larger runs.
What if some fields are missing?
Some books may have incomplete public metadata. The actor still saves available fields and continues.