Goodreads Scraper
Pricing
$8.00/month + usage
Goodreads Scraper
Goodreads Scraper r uses the Open Library API to collect detailed book data by query. It extracts title, author, ISBN, publisher, publish year, pages, categories, ratings, description, cover image, and preview link. Outputs structured JSON for catalogs, apps, and research use.
Pricing
$8.00/month + usage
Rating
0.0
(0)
Developer
Data Pilot
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Overview
The Goodreads Scraper is an Apify Actor that extracts book metadata directly from Goodreads book pages. Provide one or more Goodreads URLs and the actor returns structured data including title, author, rating, description, and cover image. Whether you're building a book database, analyzing reader sentiment, or powering a recommendation engine, this actor delivers accurate Goodreads data efficiently.
With proxy support and built-in anti-blocking delays, it ensures reliable access to Goodreads pages without interruptions.
Features
- Direct Page Scraping – Extracts data straight from Goodreads book pages using HTML parsing.
- Rich Metadata – Returns title, author, rating, description, and cover image for each book.
- Batch Processing – Processes multiple Goodreads URLs in a single run.
- Proxy Support – Optionally uses Apify residential proxies to avoid IP blocking.
- Anti-Blocking Delays – Adds random delays between requests to mimic human browsing.
- Error Handling – Logs errors and continues processing remaining URLs.
- Dataset Integration – Automatically pushes all scraped data to your Apify dataset for easy export.
How It Works
- Input – Provide a list of Goodreads book page URLs.
- Fetch Page – The actor requests each URL with browser-like headers and optional proxy.
- Parse HTML – It uses BeautifulSoup to extract book metadata from the page structure.
- Build Output – Structures all available data into a clean record and pushes it to the dataset.
- Repeat – Processes all URLs with a random delay between requests.
Input
| Field | Type | Default | Description |
|---|---|---|---|
urls | Array of strings | [] | Required. List of Goodreads book page URLs to scrape. |
proxyConfiguration | Object | {} | Apify proxy configuration (e.g., { "proxyGroups": ["RESIDENTIAL"] }). |
Example input:
{"urls": ["https://www.goodreads.com/book/show/40121378-atomic-habits","https://www.goodreads.com/book/show/865.The_Alchemist"],"proxyConfiguration": {"proxyGroups": ["RESIDENTIAL"],"apifyProxyCountry": "US"}}
Output
Each book is pushed as a separate dataset record with the following fields:
| Field | Type | Description |
|---|---|---|
title | string | Book title. |
authorName | string | Author's full name. |
rating | string | Average Goodreads rating (e.g., "4.37"). |
description | string | Book description/synopsis. |
language | string | Language code (default: "ENG"). |
currency | string | Currency code (default: "USD"). |
cover_image | string | Direct URL to the book's cover image. |
source | string | Data source (always "Goodreads"). |
preview_link | string | The original Goodreads page URL. |
url | string | The original Goodreads page URL. |
Example output:
{"title": "Atomic Habits","authorName": "James Clear","rating": "4.37","description": "No matter your goals, Atomic Habits offers a proven framework for improving every day...","language": "ENG","currency": "USD","cover_image": "https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1655988385l/40121378.jpg","source": "Goodreads","preview_link": "https://www.goodreads.com/book/show/40121378-atomic-habits","url": "https://www.goodreads.com/book/show/40121378-atomic-habits"}
Use Cases
- Book Databases – Build and maintain a structured catalog with Goodreads metadata.
- Recommendation Engines – Power book recommendation systems using ratings and descriptions.
- Publishing Research – Analyze Goodreads ratings and reader trends across genres.
- E-commerce Enrichment – Enrich product listings with Goodreads descriptions and cover images.
- Academic Research – Collect structured Goodreads data for literature or data science projects.
- Content Aggregation – Aggregate Goodreads book data for blogs, apps, or reading platforms.
Quick Start
- Open on Apify – Visit the actor page and click Try for free.
- Set Input – Paste your Goodreads book page URLs into the
urlsfield. - Enable Proxy (Optional) – Configure proxy groups to avoid rate limiting.
- Run the Actor – Start the run and monitor progress in the logs.
- Download Results – Export the dataset as JSON, CSV, or Excel once finished.
Technical Stack
- Data Source – Goodreads (HTML scraping)
- HTML Parser –
BeautifulSoupwithlxmlbackend - HTTP Client –
requestswith browser-like headers and optional proxy support - Proxy – Apify Proxy (residential or datacenter)
- Platform – Apify Actor — serverless, scalable, integrated with Dataset and Key-Value Store
Related Tools
| Actor | Description |
|---|---|
| Book Metadata Scraper | Extracts rich book metadata from the Open Library database. |
| Amazon Book Scraper | Scrapes book listings, prices, and reviews from Amazon. |
| Google Books Scraper | Fetches book metadata and previews via the Google Books API. |
| ISBN Lookup Tool | Looks up detailed book info by ISBN from multiple data sources. |
| Book Price Comparator | Compares book prices across major online retailers. |
Changelog
v1.0.0 – Initial Release
- Direct HTML scraping of Goodreads book pages
- Title, author, rating, description, and cover image extraction
- Proxy configuration support
- Anti-blocking random delays
- Dataset integration with error handling
Pricing
- Free for basic usage on Apify (up to certain compute limits).
- Paid plans available for higher volume, priority support, and longer runs.
- Proxy credits consumed if residential proxies are enabled.
Support & Feedback
- Issues & Ideas – Open a ticket on the Apify Actor issue tracker.
- Documentation – Visit Apify Docs for platform guides.
- Scraping Notes – Use proxies and keep request rates low to avoid blocks from Goodreads.
Disclaimer: This actor scrapes publicly visible data from Goodreads. Please ensure your usage complies with Goodreads' terms of service. This actor is intended for research and informational purposes only.