Goodreads Scraper
Pricing
from $5.00 / 1,000 results
Goodreads Scraper
Scrape Goodreads book data. Search by title, author, or ISBN. Returns ratings, reviews, genres, page counts, and publication info.
An Apify Actor that scrapes book data from Goodreads shelf/genre pages. Built with Crawlee CheerioCrawler for fast, efficient HTML parsing.
What it does
This scraper extracts book data from Goodreads shelf pages (e.g., goodreads.com/shelf/show/science-fiction). Each shelf page lists 50 popular books in that genre, ranked by how many users shelved them.
Two modes of operation
-
Shelf mode (default): Scrapes book listings from shelf pages. Fast -- extracts title, author, rating, rating count, published year, shelf count, and cover image directly from the listing.
-
Detail mode (
scrapeDetails: true): Also visits each book's individual page to extract rich structured data from Goodreads' JSON-LD schema, including full description, genres, page count, ISBN, book format, language, awards, and review count.
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
searchQueries | string[] | ["science-fiction"] | Shelf/genre names to scrape. Maps to goodreads.com/shelf/show/{name}. |
maxBooks | integer | 100 | Maximum books per shelf. Set to 0 for unlimited. |
scrapeDetails | boolean | false | Visit each book's detail page for full data (slower). |
proxyConfiguration | object | { useApifyProxy: false } | Proxy settings for large-scale runs. |
Popular shelf names
science-fiction, fantasy, mystery, romance, non-fiction, thriller, horror, historical-fiction, young-adult, classics, biography, self-help, poetry, graphic-novels, philosophy, true-crime, dystopian, adventure, humor, manga
Output
Shelf mode fields
| Field | Type | Description |
|---|---|---|
title | string | Book title (may include series info) |
author | string | Author name |
authorUrl | string | Link to author's Goodreads page |
rating | number | Average rating (1-5 scale) |
ratingCount | number | Total number of ratings |
publishedYear | number | Year of first publication |
shelfCount | number | Times shelved under this genre |
coverImage | string | URL of book cover image |
bookId | string | Goodreads book ID |
url | string | Full Goodreads URL for the book |
searchQuery | string | The shelf/genre name used |
scrapedAt | string | ISO 8601 timestamp |
Additional detail mode fields
| Field | Type | Description |
|---|---|---|
description | string | Full book description |
genres | string[] | List of genres/tags |
isbn | string | ISBN number |
bookFormat | string | Format (Hardcover, Paperback, etc.) |
numberOfPages | number | Page count |
language | string | Book language |
awards | string | Awards received |
reviewCount | number | Total number of text reviews |
authors | object[] | Array of author objects with name and URL |
publicationInfo | string | Full publication details |
Example usage
Basic: Top science fiction books
{"searchQueries": ["science-fiction"],"maxBooks": 50}
Multiple genres with details
{"searchQueries": ["fantasy", "mystery", "romance"],"maxBooks": 100,"scrapeDetails": true}
Large-scale with proxy
{"searchQueries": ["science-fiction", "fantasy", "thriller", "horror"],"maxBooks": 500,"scrapeDetails": true,"proxyConfiguration": {"useApifyProxy": true}}
Example output
{"title": "Dune (Dune, #1)","author": "Frank Herbert","authorUrl": "https://www.goodreads.com/author/show/58.Frank_Herbert","rating": 4.29,"ratingCount": 1645579,"publishedYear": 1965,"shelfCount": 21643,"coverImage": "https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1555447414l/44767458._SX318_.jpg","bookId": "44767458","url": "https://www.goodreads.com/book/show/44767458-dune","searchQuery": "science-fiction","scrapedAt": "2026-03-17T12:00:00.000Z"}
Technical notes
- Uses CheerioCrawler (HTTP-only, no browser) for maximum speed
- Shelf pages (
/shelf/show/) and book pages (/book/show/) are allowed by Goodreads robots.txt - Rate-limited to avoid overloading servers (30-40 requests/minute)
- Shelf pages return 50 books per page, paginated with
?page=N - Detail pages use Goodreads' JSON-LD
@type: Bookstructured data for reliable extraction - No Cloudflare or CAPTCHA protection on these endpoints
Running locally
$apify run --purge
Make sure to set your input in storage/key_value_stores/default/INPUT.json.
Deploy to Apify
apify loginapify push
Related Scrapers
More marketplace scrapers and data tools by lulzasaur:
- AbeBooks Scraper — Rare and used books
- Bonanza Scraper — Online marketplace listings
- Contractor License Verifier — Multi-state license verification
- Craigslist Scraper — Classifieds and for-sale posts
- Grailed Scraper — Luxury fashion resale
- Houzz Scraper — Home improvement professionals
- IMDb Scraper — Movie and TV show data
- Nurse License Verifier — State nursing board verification
- OfferUp Scraper — Local marketplace listings
- Poshmark Scraper — Fashion resale marketplace
- PSA Population Report — Card grading data
- Redfin Scraper — Real estate listings and prices
- Reverb Scraper — Music gear marketplace
- StubHub Scraper — Event ticket prices
- Swappa Scraper — Used electronics marketplace
- TCGPlayer Scraper — Trading card prices
- ThriftBooks Scraper — Used book prices
- Thumbtack Scraper — Local service professionals