Goodreads Books Scraper avatar

Goodreads Books Scraper

Pricing

Pay per usage

Go to Apify Store
Goodreads Books Scraper

Goodreads Books Scraper

Efficiently extract detailed book data with the Goodreads Books Scraper. Ideal for building reading lists or analyzing metadata. Note: For bulk scraping of more than 50 books, providing JSON cookies is essential to ensure seamless access and reliable results.

Pricing

Pay per usage

Rating

5.0

(1)

Developer

Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

0

Bookmarked

7

Total users

3

Monthly active users

11 days ago

Last modified

Share

Goodreads Book Scraper

Extract rich Goodreads book data at scale from public shelves and related results. Collect complete records including title, author, ratings, review volume, description, ISBN, publisher, publication date, genres, image, and URL. Built for research, analysis, monitoring, and dataset creation.

Features

  • Complete book records — Always returns detailed fields for every collected book
  • Automatic pagination — Continues collecting across multiple pages to reach your target
  • Fast collection flow — Parallel detail fetching for better throughput
  • Stable public extraction — Designed to work on public Goodreads data
  • Structured output — Clean dataset items ready for BI, ETL, and automation

Use Cases

Market Research

Track popular books, ratings, and genre movement over time. Build snapshots for trend analysis and category planning.

Recommendation Datasets

Create feature-rich book datasets for recommendation systems, ranking models, or catalog enrichment workflows.

Content Strategy

Discover high-interest titles and genres to inform newsletters, blog content, and reading list curation.

Competitive Intelligence

Monitor visibility and engagement signals by shelf category to compare market segments.

Academic Analysis

Use structured book metadata for reading behavior studies, literature analysis, and longitudinal research.

Input Parameters

ParameterTypeRequiredDefaultDescription
startUrlsArrayNo[{"url":"https://www.goodreads.com/shelf/show/fantasy"}]Optional shelf URL list. First valid shelf path is used as seed.
shelfStringNo"fantasy"Shelf name to scrape when URL list is not provided.
results_wantedIntegerNo50Maximum number of books to save.
max_pagesIntegerNo10Maximum discovery pages to scan.
proxyConfigurationObjectNo{ "useApifyProxy": true }Optional proxy settings for better reliability.

Output Data

Each dataset item contains:

FieldTypeDescription
titleStringBook title
authorStringPrimary author
ratingNumberAverage rating
ratingCountNumberTotal ratings count
reviewCountNumberTotal text reviews count
descriptionStringBook description
imageStringCover image URL
isbnStringISBN (or ISBN-13 when available)
publisherStringPublisher name
publishDateStringPublication date in YYYY-MM-DD
genresArrayGenre list
urlStringGoodreads book URL
_sourceStringSource marker

Usage Examples

Basic Shelf Run

{
"shelf": "fantasy",
"results_wanted": 100,
"max_pages": 10
}

URL-Driven Run

{
"startUrls": [
{ "url": "https://www.goodreads.com/shelf/show/science-fiction" }
],
"results_wanted": 150,
"max_pages": 20
}

High-Volume Run with Proxy

{
"shelf": "mystery",
"results_wanted": 500,
"max_pages": 40,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Sample Output

{
"title": "The Name of the Wind",
"author": "Patrick Rothfuss",
"rating": 4.52,
"ratingCount": 985432,
"reviewCount": 45678,
"description": "Told in Kvothe's own voice, this is the tale of the magically gifted young man...",
"image": "https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/....jpg",
"isbn": "9780756404741",
"publisher": "DAW Books",
"publishDate": "2007-03-27",
"genres": ["Fantasy", "Fiction", "Adventure"],
"url": "https://www.goodreads.com/book/show/186074.The_Name_of_the_Wind",
"_source": "goodreads-api"
}

Tips for Best Results

Start Small First

  • Run with results_wanted between 50 and 100 to validate quickly
  • Increase limits after confirming your target shelf behavior

Tune Page Limits

  • Raise max_pages when targeting larger collections
  • Keep max_pages proportional to results_wanted

Use Proxies for Long Runs

  • Use residential proxy settings for improved stability
  • Keep retries low and focus on steady throughput

Validate Output Early

  • Check first 20 items for field completeness
  • Confirm titles, authors, ratings, and genres match your expectations

Integrations

  • Google Sheets — Export datasets for quick analysis
  • Airtable — Build searchable book intelligence tables
  • Looker Studio / Power BI — Visualize rating and genre trends
  • Zapier / Make — Trigger downstream automations
  • Webhooks — Feed your own APIs and pipelines

Export Formats

  • JSON — Best for APIs and programmatic processing
  • CSV — Spreadsheet-friendly analysis
  • Excel — Business reporting workflows
  • XML — Legacy pipeline compatibility

Frequently Asked Questions

Why do I see 50 books from shelf pages?

Goodreads public shelf pages can repeat a first-page set. The actor continues discovery with additional public result pages to reach your requested volume.

Do I need cookies?

No. This actor is configured for public data collection without authentication cookies.

Is detailed output optional?

No. This actor is configured to always return detailed book records.

How many books can I collect?

Use results_wanted and max_pages to control volume. Increase both for larger runs.

What if some fields are missing?

Some books may have incomplete public metadata. The actor still saves available fields and continues.