Goodreads Scraper avatar

Goodreads Scraper

Under maintenance

Pricing

Pay per usage

Go to Apify Store
Goodreads Scraper

Goodreads Scraper

Under maintenance

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Ricardo Akiyoshi

Ricardo Akiyoshi

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

3 days ago

Last modified

Categories

Share

Goodreads Book Scraper — Ratings, Reviews & Metadata

Scrape Goodreads for comprehensive book data. Extract titles, authors, ratings, review counts, genres, ISBNs, page counts, series information, cover images, publication dates, and descriptions. Supports search by keyword, direct book URLs, Goodreads lists, shelves, and author pages.

Features

  • Comprehensive book data — titles, authors, ratings, review counts, genres, ISBNs, page counts, descriptions
  • Series tracking — series name and book position within the series
  • Cover images — high-resolution cover image URLs
  • Publication info — publish dates (original and edition), publisher, format, language
  • Search mode — search by title, author, or keyword with sorting options
  • Direct URL mode — scrape specific book pages, lists, shelves, or author bibliographies
  • Four extraction strategies — JSON-LD, Apollo/GraphQL state, DOM parsing, meta tag fallback
  • Smart filtering — filter by minimum rating, page count, language, or publication date
  • Deduplication — prevents duplicate books based on Goodreads ID, ISBN, or title+author
  • Top reviews — optionally extract community reviews with ratings and text
  • Edition details — optionally extract format, publisher, and identifier data
  • Proxy support — works with Apify proxies for reliable large-scale scraping
  • Pay-per-event — charged only per book successfully scraped
  • Data quality scoring — each result includes a quality score (0-100) based on field completeness

Use Cases

Book Recommendation Engines

Build recommendation systems by scraping genres, ratings, and descriptions across thousands of books. Use rating distributions and review text for collaborative and content-based filtering.

Publishing Market Research

Analyze trends in book publishing — which genres are growing, what rating distributions look like across categories, and how page counts correlate with popularity. Track new releases by date.

Author Bibliography Analysis

Scrape complete author bibliographies to analyze output frequency, rating trends over time, genre diversity, and series completion status.

Academic Literature Surveys

Build reading lists for research topics. Scrape books by keyword, filter by publication date, and export structured metadata for citation management tools.

Library Cataloging

Extract ISBNs, titles, authors, page counts, and cover images to build or enrich digital library catalogs. Supports ISBN-10 and ISBN-13 formats.

Bookstore Inventory Enrichment

Enrich product listings with Goodreads ratings, review counts, genres, and descriptions. Match by ISBN for accurate data linkage.

Reading Challenge Tracking

Scrape Goodreads lists and shelves to track popular books, award winners, and trending titles for reading challenges or book clubs.

Content Marketing for Book Blogs

Generate structured data for book review blogs — cover images, descriptions, genre tags, and series info ready for CMS import.

Input Parameters

ParameterTypeDefaultDescription
searchQuerystring"Dune"Search by title, author, or keyword
startUrlsarray[]Direct Goodreads URLs (books, lists, shelves, authors)
maxResultsinteger50Maximum books to scrape (0 = unlimited)
includeDescriptionbooleantrueExtract full book description
includeEditionsbooleanfalseExtract detailed edition info
includeTopReviewsbooleanfalseExtract top community reviews (up to 5)
sortByenum"relevance"relevance, title, date_published, num_ratings
languageFilterstringFilter by language code (e.g., en, es, fr)
minRatingnumber0Minimum average rating (0-5)
maxPagesinteger0Maximum page count (0 = no limit)
publishedAfterstringOnly books published after this date (YYYY-MM-DD)
publishedBeforestringOnly books published before this date (YYYY-MM-DD)
proxyConfigurationobjectApify proxy settings
maxConcurrencyinteger5Parallel page requests (1-50)
requestTimeoutinteger60Page load timeout in seconds

Example: Search for Science Fiction

{
"searchQuery": "best science fiction",
"maxResults": 100,
"sortBy": "num_ratings",
"minRating": 4.0
}

Example: Scrape a Goodreads List

{
"startUrls": [
{ "url": "https://www.goodreads.com/list/show/1.Best_Books_Ever" }
],
"maxResults": 200,
"includeTopReviews": true
}

Example: Scrape Specific Books

{
"startUrls": [
{ "url": "https://www.goodreads.com/book/show/234225.Dune" },
{ "url": "https://www.goodreads.com/book/show/5107.The_Catcher_in_the_Rye" },
{ "url": "https://www.goodreads.com/book/show/4671.The_Great_Gatsby" }
],
"includeDescription": true,
"includeEditions": true,
"includeTopReviews": true
}

Example: Author Bibliography

{
"startUrls": [
{ "url": "https://www.goodreads.com/author/show/3389.Stephen_King" }
],
"maxResults": 50,
"sortBy": "num_ratings"
}

Example: Recent High-Rated Fantasy

{
"searchQuery": "fantasy",
"maxResults": 50,
"minRating": 4.2,
"publishedAfter": "2023-01-01",
"sortBy": "num_ratings"
}

Output

Each book in the dataset contains the following fields:

{
"title": "Dune",
"author": "Frank Herbert",
"authorUrl": "https://www.goodreads.com/author/show/58.Frank_Herbert",
"rating": 4.27,
"ratingsCount": 1234567,
"reviewsCount": 45678,
"pages": 688,
"publishDate": "2005-08-02",
"originalPublishDate": "1965",
"isbn": "0441013597",
"isbn13": "9780441013593",
"genres": ["Science Fiction", "Fiction", "Fantasy", "Classics", "Space Opera"],
"description": "Set on the desert planet Arrakis, Dune is the story of the boy Paul Atreides...",
"coverImage": "https://images-na.ssl-images-amazon.com/images/S/compressed.photo.goodreads.com/books/1555447414i/234225.jpg",
"series": "Dune",
"seriesPosition": "1",
"bookUrl": "https://www.goodreads.com/book/show/234225.Dune",
"goodreadsId": "234225",
"publisher": "Ace Books",
"format": "Paperback",
"language": "English",
"asin": "0441013597",
"awards": ["Nebula Award for Best Novel (1965)", "Hugo Award for Best Novel (1966)"],
"scrapedAt": "2026-03-02T12:00:00.000Z",
"extractionStrategies": ["json-ld", "dom"],
"dataQualityScore": 91
}

Output Fields Reference

FieldTypeDescription
titlestringBook title
authorstringAuthor name(s), comma-separated if multiple
authorUrlstringLink to the author's Goodreads page
ratingnumberAverage rating (0-5, two decimal places)
ratingsCountnumberTotal number of ratings
reviewsCountnumberTotal number of text reviews
pagesnumberPage count
publishDatestringEdition publication date (YYYY-MM-DD)
originalPublishDatestringOriginal publication date
isbnstringISBN-10
isbn13stringISBN-13
genresarrayGenre/shelf tags
descriptionstringFull book description
coverImagestringHigh-resolution cover image URL
seriesstringSeries name (if part of a series)
seriesPositionstringPosition in the series
bookUrlstringGoodreads book page URL
goodreadsIdstringGoodreads book ID
publisherstringPublisher name
formatstringBook format (Paperback, Hardcover, Kindle, etc.)
languagestringBook language
asinstringAmazon ASIN
awardsarrayLiterary awards (if any)
dataQualityScorenumberData completeness score (0-100)
scrapedAtstringISO timestamp of when the data was scraped

Integration — Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
# Search for books
run = client.actor("sovereigntaylor/goodreads-scraper").call(run_input={
"searchQuery": "machine learning",
"maxResults": 50,
"minRating": 4.0,
"sortBy": "num_ratings"
})
# Process results
for book in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{book['title']} by {book['author']}")
print(f" Rating: {book['rating']}/5 ({book['ratingsCount']} ratings)")
print(f" Genres: {', '.join(book.get('genres') or ['N/A'])}")
print(f" ISBN: {book.get('isbn13', 'N/A')}")
print(f" Pages: {book.get('pages', 'N/A')}")
print()

Export to CSV

import csv
from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("sovereigntaylor/goodreads-scraper").call(run_input={
"searchQuery": "best novels 2025",
"maxResults": 200
})
with open("books.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=[
"title", "author", "rating", "ratingsCount", "pages",
"publishDate", "isbn13", "genres", "series", "bookUrl"
])
writer.writeheader()
for book in client.dataset(run["defaultDatasetId"]).iterate_items():
book["genres"] = ", ".join(book.get("genres") or [])
writer.writerow({k: book.get(k) for k in writer.fieldnames})

Integration — JavaScript

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
// Search for books
const run = await client.actor('sovereigntaylor/goodreads-scraper').call({
searchQuery: 'machine learning',
maxResults: 50,
minRating: 4.0,
sortBy: 'num_ratings',
});
// Process results
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(book => {
console.log(`${book.title} by ${book.author}`);
console.log(` Rating: ${book.rating}/5 (${book.ratingsCount} ratings)`);
console.log(` Genres: ${(book.genres || []).join(', ')}`);
console.log(` ISBN: ${book.isbn13 || 'N/A'}`);
});

Webhook Integration

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
// Start with webhook notification
const run = await client.actor('sovereigntaylor/goodreads-scraper').start({
searchQuery: 'fantasy 2025',
maxResults: 100,
}, {
webhooks: [{
eventTypes: ['ACTOR.RUN.SUCCEEDED'],
requestUrl: 'https://your-server.com/webhook/goodreads',
}],
});
console.log(`Run started: ${run.id}`);

Pricing

Pay-per-event pricing — you only pay for data you receive:

  • $0.004 per book scraped (full metadata from book page)
  • $0.002 per search result scraped (partial data from search listing)

No subscription. No minimum spend. Free tier available for small runs.

Cost Examples

Use CaseBooksEstimated Cost
Quick search (50 books)50$0.20
Author bibliography100$0.40
Large list scrape500$2.00
Genre research1,000$4.00
Full catalog export5,000$20.00

Tips for Best Results

  1. Use proxy — Goodreads rate-limits aggressively. Enable Apify proxy for runs over 20 books.
  2. Start small — Test with 10-20 books before running large scrapes.
  3. Use direct URLs — If you know the exact books, provide startUrls for faster and more reliable scraping.
  4. Filter early — Use minRating, maxPages, and date filters to reduce unnecessary scraping.
  5. Low concurrency — Keep maxConcurrency at 3-5 to avoid rate limits.

FAQ

Q: Can I scrape user shelves (e.g., "to-read" lists)? A: Yes. Provide the shelf URL in startUrls, e.g., https://www.goodreads.com/shelf/show/fantasy. Note that private user shelves require authentication and are not supported.

Q: Does it handle series detection? A: Yes. The scraper extracts series name and book position (e.g., "Dune #1") when available on the book page.

Q: What if a book page is missing data? A: The scraper uses four extraction strategies (JSON-LD, Apollo state, DOM, meta tags) and merges results. The dataQualityScore field (0-100) indicates how complete the data is.

Q: Can I filter by genre? A: Not directly in the input (Goodreads search does not support genre filters). Instead, search for genre keywords and use post-processing to filter by the genres array in the output.

Q: How often can I run this scraper? A: As often as needed. Each run is independent. For monitoring, use Apify Schedules to run daily or weekly.

Q: What happens if Goodreads blocks the request? A: The scraper detects CAPTCHAs and block pages, then retries with a different proxy. Configure proxy settings for best results.