Goodreads Scraper
Pricing
Pay per usage
Goodreads Scraper
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Ricardo Akiyoshi
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
3 days ago
Last modified
Categories
Share
Goodreads Book Scraper — Ratings, Reviews & Metadata
Scrape Goodreads for comprehensive book data. Extract titles, authors, ratings, review counts, genres, ISBNs, page counts, series information, cover images, publication dates, and descriptions. Supports search by keyword, direct book URLs, Goodreads lists, shelves, and author pages.
Features
- Comprehensive book data — titles, authors, ratings, review counts, genres, ISBNs, page counts, descriptions
- Series tracking — series name and book position within the series
- Cover images — high-resolution cover image URLs
- Publication info — publish dates (original and edition), publisher, format, language
- Search mode — search by title, author, or keyword with sorting options
- Direct URL mode — scrape specific book pages, lists, shelves, or author bibliographies
- Four extraction strategies — JSON-LD, Apollo/GraphQL state, DOM parsing, meta tag fallback
- Smart filtering — filter by minimum rating, page count, language, or publication date
- Deduplication — prevents duplicate books based on Goodreads ID, ISBN, or title+author
- Top reviews — optionally extract community reviews with ratings and text
- Edition details — optionally extract format, publisher, and identifier data
- Proxy support — works with Apify proxies for reliable large-scale scraping
- Pay-per-event — charged only per book successfully scraped
- Data quality scoring — each result includes a quality score (0-100) based on field completeness
Use Cases
Book Recommendation Engines
Build recommendation systems by scraping genres, ratings, and descriptions across thousands of books. Use rating distributions and review text for collaborative and content-based filtering.
Publishing Market Research
Analyze trends in book publishing — which genres are growing, what rating distributions look like across categories, and how page counts correlate with popularity. Track new releases by date.
Author Bibliography Analysis
Scrape complete author bibliographies to analyze output frequency, rating trends over time, genre diversity, and series completion status.
Academic Literature Surveys
Build reading lists for research topics. Scrape books by keyword, filter by publication date, and export structured metadata for citation management tools.
Library Cataloging
Extract ISBNs, titles, authors, page counts, and cover images to build or enrich digital library catalogs. Supports ISBN-10 and ISBN-13 formats.
Bookstore Inventory Enrichment
Enrich product listings with Goodreads ratings, review counts, genres, and descriptions. Match by ISBN for accurate data linkage.
Reading Challenge Tracking
Scrape Goodreads lists and shelves to track popular books, award winners, and trending titles for reading challenges or book clubs.
Content Marketing for Book Blogs
Generate structured data for book review blogs — cover images, descriptions, genre tags, and series info ready for CMS import.
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
searchQuery | string | "Dune" | Search by title, author, or keyword |
startUrls | array | [] | Direct Goodreads URLs (books, lists, shelves, authors) |
maxResults | integer | 50 | Maximum books to scrape (0 = unlimited) |
includeDescription | boolean | true | Extract full book description |
includeEditions | boolean | false | Extract detailed edition info |
includeTopReviews | boolean | false | Extract top community reviews (up to 5) |
sortBy | enum | "relevance" | relevance, title, date_published, num_ratings |
languageFilter | string | — | Filter by language code (e.g., en, es, fr) |
minRating | number | 0 | Minimum average rating (0-5) |
maxPages | integer | 0 | Maximum page count (0 = no limit) |
publishedAfter | string | — | Only books published after this date (YYYY-MM-DD) |
publishedBefore | string | — | Only books published before this date (YYYY-MM-DD) |
proxyConfiguration | object | — | Apify proxy settings |
maxConcurrency | integer | 5 | Parallel page requests (1-50) |
requestTimeout | integer | 60 | Page load timeout in seconds |
Example: Search for Science Fiction
{"searchQuery": "best science fiction","maxResults": 100,"sortBy": "num_ratings","minRating": 4.0}
Example: Scrape a Goodreads List
{"startUrls": [{ "url": "https://www.goodreads.com/list/show/1.Best_Books_Ever" }],"maxResults": 200,"includeTopReviews": true}
Example: Scrape Specific Books
{"startUrls": [{ "url": "https://www.goodreads.com/book/show/234225.Dune" },{ "url": "https://www.goodreads.com/book/show/5107.The_Catcher_in_the_Rye" },{ "url": "https://www.goodreads.com/book/show/4671.The_Great_Gatsby" }],"includeDescription": true,"includeEditions": true,"includeTopReviews": true}
Example: Author Bibliography
{"startUrls": [{ "url": "https://www.goodreads.com/author/show/3389.Stephen_King" }],"maxResults": 50,"sortBy": "num_ratings"}
Example: Recent High-Rated Fantasy
{"searchQuery": "fantasy","maxResults": 50,"minRating": 4.2,"publishedAfter": "2023-01-01","sortBy": "num_ratings"}
Output
Each book in the dataset contains the following fields:
{"title": "Dune","author": "Frank Herbert","authorUrl": "https://www.goodreads.com/author/show/58.Frank_Herbert","rating": 4.27,"ratingsCount": 1234567,"reviewsCount": 45678,"pages": 688,"publishDate": "2005-08-02","originalPublishDate": "1965","isbn": "0441013597","isbn13": "9780441013593","genres": ["Science Fiction", "Fiction", "Fantasy", "Classics", "Space Opera"],"description": "Set on the desert planet Arrakis, Dune is the story of the boy Paul Atreides...","coverImage": "https://images-na.ssl-images-amazon.com/images/S/compressed.photo.goodreads.com/books/1555447414i/234225.jpg","series": "Dune","seriesPosition": "1","bookUrl": "https://www.goodreads.com/book/show/234225.Dune","goodreadsId": "234225","publisher": "Ace Books","format": "Paperback","language": "English","asin": "0441013597","awards": ["Nebula Award for Best Novel (1965)", "Hugo Award for Best Novel (1966)"],"scrapedAt": "2026-03-02T12:00:00.000Z","extractionStrategies": ["json-ld", "dom"],"dataQualityScore": 91}
Output Fields Reference
| Field | Type | Description |
|---|---|---|
title | string | Book title |
author | string | Author name(s), comma-separated if multiple |
authorUrl | string | Link to the author's Goodreads page |
rating | number | Average rating (0-5, two decimal places) |
ratingsCount | number | Total number of ratings |
reviewsCount | number | Total number of text reviews |
pages | number | Page count |
publishDate | string | Edition publication date (YYYY-MM-DD) |
originalPublishDate | string | Original publication date |
isbn | string | ISBN-10 |
isbn13 | string | ISBN-13 |
genres | array | Genre/shelf tags |
description | string | Full book description |
coverImage | string | High-resolution cover image URL |
series | string | Series name (if part of a series) |
seriesPosition | string | Position in the series |
bookUrl | string | Goodreads book page URL |
goodreadsId | string | Goodreads book ID |
publisher | string | Publisher name |
format | string | Book format (Paperback, Hardcover, Kindle, etc.) |
language | string | Book language |
asin | string | Amazon ASIN |
awards | array | Literary awards (if any) |
dataQualityScore | number | Data completeness score (0-100) |
scrapedAt | string | ISO timestamp of when the data was scraped |
Integration — Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")# Search for booksrun = client.actor("sovereigntaylor/goodreads-scraper").call(run_input={"searchQuery": "machine learning","maxResults": 50,"minRating": 4.0,"sortBy": "num_ratings"})# Process resultsfor book in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"{book['title']} by {book['author']}")print(f" Rating: {book['rating']}/5 ({book['ratingsCount']} ratings)")print(f" Genres: {', '.join(book.get('genres') or ['N/A'])}")print(f" ISBN: {book.get('isbn13', 'N/A')}")print(f" Pages: {book.get('pages', 'N/A')}")print()
Export to CSV
import csvfrom apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("sovereigntaylor/goodreads-scraper").call(run_input={"searchQuery": "best novels 2025","maxResults": 200})with open("books.csv", "w", newline="", encoding="utf-8") as f:writer = csv.DictWriter(f, fieldnames=["title", "author", "rating", "ratingsCount", "pages","publishDate", "isbn13", "genres", "series", "bookUrl"])writer.writeheader()for book in client.dataset(run["defaultDatasetId"]).iterate_items():book["genres"] = ", ".join(book.get("genres") or [])writer.writerow({k: book.get(k) for k in writer.fieldnames})
Integration — JavaScript
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });// Search for booksconst run = await client.actor('sovereigntaylor/goodreads-scraper').call({searchQuery: 'machine learning',maxResults: 50,minRating: 4.0,sortBy: 'num_ratings',});// Process resultsconst { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach(book => {console.log(`${book.title} by ${book.author}`);console.log(` Rating: ${book.rating}/5 (${book.ratingsCount} ratings)`);console.log(` Genres: ${(book.genres || []).join(', ')}`);console.log(` ISBN: ${book.isbn13 || 'N/A'}`);});
Webhook Integration
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });// Start with webhook notificationconst run = await client.actor('sovereigntaylor/goodreads-scraper').start({searchQuery: 'fantasy 2025',maxResults: 100,}, {webhooks: [{eventTypes: ['ACTOR.RUN.SUCCEEDED'],requestUrl: 'https://your-server.com/webhook/goodreads',}],});console.log(`Run started: ${run.id}`);
Pricing
Pay-per-event pricing — you only pay for data you receive:
- $0.004 per book scraped (full metadata from book page)
- $0.002 per search result scraped (partial data from search listing)
No subscription. No minimum spend. Free tier available for small runs.
Cost Examples
| Use Case | Books | Estimated Cost |
|---|---|---|
| Quick search (50 books) | 50 | $0.20 |
| Author bibliography | 100 | $0.40 |
| Large list scrape | 500 | $2.00 |
| Genre research | 1,000 | $4.00 |
| Full catalog export | 5,000 | $20.00 |
Tips for Best Results
- Use proxy — Goodreads rate-limits aggressively. Enable Apify proxy for runs over 20 books.
- Start small — Test with 10-20 books before running large scrapes.
- Use direct URLs — If you know the exact books, provide
startUrlsfor faster and more reliable scraping. - Filter early — Use
minRating,maxPages, and date filters to reduce unnecessary scraping. - Low concurrency — Keep
maxConcurrencyat 3-5 to avoid rate limits.
FAQ
Q: Can I scrape user shelves (e.g., "to-read" lists)?
A: Yes. Provide the shelf URL in startUrls, e.g., https://www.goodreads.com/shelf/show/fantasy. Note that private user shelves require authentication and are not supported.
Q: Does it handle series detection? A: Yes. The scraper extracts series name and book position (e.g., "Dune #1") when available on the book page.
Q: What if a book page is missing data?
A: The scraper uses four extraction strategies (JSON-LD, Apollo state, DOM, meta tags) and merges results. The dataQualityScore field (0-100) indicates how complete the data is.
Q: Can I filter by genre?
A: Not directly in the input (Goodreads search does not support genre filters). Instead, search for genre keywords and use post-processing to filter by the genres array in the output.
Q: How often can I run this scraper? A: As often as needed. Each run is independent. For monitoring, use Apify Schedules to run daily or weekly.
Q: What happens if Goodreads blocks the request? A: The scraper detects CAPTCHAs and block pages, then retries with a different proxy. Configure proxy settings for best results.
Related Actors
- Amazon Product Scraper — Scrape Amazon product listings, prices, and reviews
- Amazon Reviews Scraper — Extract customer reviews from Amazon products
- Google Search Scraper — Scrape Google search results for any query
- IMDb Scraper — Extract movie and TV show data from IMDb
- Reddit Scraper — Scrape Reddit posts and comments
- Product Hunt Scraper — Extract trending products from Product Hunt