Pricing

$8.00/month + usage

Goodreads Scraper

Goodreads Scraper r uses the Open Library API to collect detailed book data by query. It extracts title, author, ISBN, publisher, publish year, pages, categories, ratings, description, cover image, and preview link. Outputs structured JSON for catalogs, apps, and research use.

Pricing

$8.00/month + usage

Rating

0.0

(0)

Developer

Data Pilot

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

Overview

The Goodreads Scraper is an Apify Actor that extracts book metadata directly from Goodreads book pages. Provide one or more Goodreads URLs and the actor returns structured data including title, author, rating, description, and cover image. Whether you're building a book database, analyzing reader sentiment, or powering a recommendation engine, this actor delivers accurate Goodreads data efficiently.

With proxy support and built-in anti-blocking delays, it ensures reliable access to Goodreads pages without interruptions.

Features

Direct Page Scraping – Extracts data straight from Goodreads book pages using HTML parsing.
Rich Metadata – Returns title, author, rating, description, and cover image for each book.
Batch Processing – Processes multiple Goodreads URLs in a single run.
Proxy Support – Optionally uses Apify residential proxies to avoid IP blocking.
Anti-Blocking Delays – Adds random delays between requests to mimic human browsing.
Error Handling – Logs errors and continues processing remaining URLs.
Dataset Integration – Automatically pushes all scraped data to your Apify dataset for easy export.

How It Works

Input – Provide a list of Goodreads book page URLs.
Fetch Page – The actor requests each URL with browser-like headers and optional proxy.
Parse HTML – It uses BeautifulSoup to extract book metadata from the page structure.
Build Output – Structures all available data into a clean record and pushes it to the dataset.
Repeat – Processes all URLs with a random delay between requests.

Input

Field	Type	Default	Description
`urls`	Array of strings	`[]`	Required. List of Goodreads book page URLs to scrape.
`proxyConfiguration`	Object	`{}`	Apify proxy configuration (e.g., `{ "proxyGroups": ["RESIDENTIAL"] }`).

Example input:

{
  "urls": [
    "https://www.goodreads.com/book/show/40121378-atomic-habits",
    "https://www.goodreads.com/book/show/865.The_Alchemist"
  ],
  "proxyConfiguration": {
    "proxyGroups": ["RESIDENTIAL"],
    "apifyProxyCountry": "US"
  }
}

Output

Each book is pushed as a separate dataset record with the following fields:

Field	Type	Description
`title`	string	Book title.
`authorName`	string	Author's full name.
`rating`	string	Average Goodreads rating (e.g., `"4.37"`).
`description`	string	Book description/synopsis.
`language`	string	Language code (default: `"ENG"`).
`currency`	string	Currency code (default: `"USD"`).
`cover_image`	string	Direct URL to the book's cover image.
`source`	string	Data source (always `"Goodreads"`).
`preview_link`	string	The original Goodreads page URL.
`url`	string	The original Goodreads page URL.

Example output:

{
  "title": "Atomic Habits",
  "authorName": "James Clear",
  "rating": "4.37",
  "description": "No matter your goals, Atomic Habits offers a proven framework for improving every day...",
  "language": "ENG",
  "currency": "USD",
  "cover_image": "https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1655988385l/40121378.jpg",
  "source": "Goodreads",
  "preview_link": "https://www.goodreads.com/book/show/40121378-atomic-habits",
  "url": "https://www.goodreads.com/book/show/40121378-atomic-habits"
}

Use Cases

Book Databases – Build and maintain a structured catalog with Goodreads metadata.
Recommendation Engines – Power book recommendation systems using ratings and descriptions.
Publishing Research – Analyze Goodreads ratings and reader trends across genres.
E-commerce Enrichment – Enrich product listings with Goodreads descriptions and cover images.
Academic Research – Collect structured Goodreads data for literature or data science projects.
Content Aggregation – Aggregate Goodreads book data for blogs, apps, or reading platforms.

Quick Start

Open on Apify – Visit the actor page and click Try for free.
Set Input – Paste your Goodreads book page URLs into the urls field.
Enable Proxy (Optional) – Configure proxy groups to avoid rate limiting.
Run the Actor – Start the run and monitor progress in the logs.
Download Results – Export the dataset as JSON, CSV, or Excel once finished.

Technical Stack

Data Source – Goodreads (HTML scraping)
HTML Parser – BeautifulSoup with lxml backend
HTTP Client – requests with browser-like headers and optional proxy support
Proxy – Apify Proxy (residential or datacenter)
Platform – Apify Actor — serverless, scalable, integrated with Dataset and Key-Value Store

Actor	Description
Book Metadata Scraper	Extracts rich book metadata from the Open Library database.
Amazon Book Scraper	Scrapes book listings, prices, and reviews from Amazon.
Google Books Scraper	Fetches book metadata and previews via the Google Books API.
ISBN Lookup Tool	Looks up detailed book info by ISBN from multiple data sources.
Book Price Comparator	Compares book prices across major online retailers.

Changelog

v1.0.0 – Initial Release

Direct HTML scraping of Goodreads book pages
Title, author, rating, description, and cover image extraction
Proxy configuration support
Anti-blocking random delays
Dataset integration with error handling

Pricing

Free for basic usage on Apify (up to certain compute limits).
Paid plans available for higher volume, priority support, and longer runs.
Proxy credits consumed if residential proxies are enabled.

Support & Feedback

Issues & Ideas – Open a ticket on the Apify Actor issue tracker.
Documentation – Visit Apify Docs for platform guides.
Scraping Notes – Use proxies and keep request rates low to avoid blocks from Goodreads.

Disclaimer: This actor scrapes publicly visible data from Goodreads. Please ensure your usage complies with Goodreads' terms of service. This actor is intended for research and informational purposes only.

Book Metadata Scraper

datapilot/book-metadata-scraper

Book Metadata Scraper uses the Open Library API to collect detailed book data by query. It extracts title, author, ISBN, publisher, publish year, pages, categories, ratings, description, cover image, and preview link. Outputs structured JSON for catalogs, apps, and research use.

Data Pilot

Goodreads Scraper

lulzasaur/goodreads-scraper

Scrape Goodreads book data. Search by title, author, or ISBN. Returns ratings, reviews, genres, page counts, and publication info.

lulz bot

Goodreads Email Scraper

scrapio/goodreads-email-scraper

Goodreads Email Scraper helps you collect author and publisher emails from Goodreads pages. Use the data for book promotions, PR outreach, and event coordination efficiently.