Goodreads Scraper avatar

Goodreads Scraper

Pricing

$8.00/month + usage

Go to Apify Store
Goodreads Scraper

Goodreads Scraper

Goodreads Scraper r uses the Open Library API to collect detailed book data by query. It extracts title, author, ISBN, publisher, publish year, pages, categories, ratings, description, cover image, and preview link. Outputs structured JSON for catalogs, apps, and research use.

Pricing

$8.00/month + usage

Rating

0.0

(0)

Developer

Data Pilot

Data Pilot

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Overview

The Goodreads Scraper is an Apify Actor that extracts book metadata directly from Goodreads book pages. Provide one or more Goodreads URLs and the actor returns structured data including title, author, rating, description, and cover image. Whether you're building a book database, analyzing reader sentiment, or powering a recommendation engine, this actor delivers accurate Goodreads data efficiently.

With proxy support and built-in anti-blocking delays, it ensures reliable access to Goodreads pages without interruptions.


Features

  • Direct Page Scraping – Extracts data straight from Goodreads book pages using HTML parsing.
  • Rich Metadata – Returns title, author, rating, description, and cover image for each book.
  • Batch Processing – Processes multiple Goodreads URLs in a single run.
  • Proxy Support – Optionally uses Apify residential proxies to avoid IP blocking.
  • Anti-Blocking Delays – Adds random delays between requests to mimic human browsing.
  • Error Handling – Logs errors and continues processing remaining URLs.
  • Dataset Integration – Automatically pushes all scraped data to your Apify dataset for easy export.

How It Works

  1. Input – Provide a list of Goodreads book page URLs.
  2. Fetch Page – The actor requests each URL with browser-like headers and optional proxy.
  3. Parse HTML – It uses BeautifulSoup to extract book metadata from the page structure.
  4. Build Output – Structures all available data into a clean record and pushes it to the dataset.
  5. Repeat – Processes all URLs with a random delay between requests.

Input

FieldTypeDefaultDescription
urlsArray of strings[]Required. List of Goodreads book page URLs to scrape.
proxyConfigurationObject{}Apify proxy configuration (e.g., { "proxyGroups": ["RESIDENTIAL"] }).

Example input:

{
"urls": [
"https://www.goodreads.com/book/show/40121378-atomic-habits",
"https://www.goodreads.com/book/show/865.The_Alchemist"
],
"proxyConfiguration": {
"proxyGroups": ["RESIDENTIAL"],
"apifyProxyCountry": "US"
}
}

Output

Each book is pushed as a separate dataset record with the following fields:

FieldTypeDescription
titlestringBook title.
authorNamestringAuthor's full name.
ratingstringAverage Goodreads rating (e.g., "4.37").
descriptionstringBook description/synopsis.
languagestringLanguage code (default: "ENG").
currencystringCurrency code (default: "USD").
cover_imagestringDirect URL to the book's cover image.
sourcestringData source (always "Goodreads").
preview_linkstringThe original Goodreads page URL.
urlstringThe original Goodreads page URL.

Example output:

{
"title": "Atomic Habits",
"authorName": "James Clear",
"rating": "4.37",
"description": "No matter your goals, Atomic Habits offers a proven framework for improving every day...",
"language": "ENG",
"currency": "USD",
"cover_image": "https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1655988385l/40121378.jpg",
"source": "Goodreads",
"preview_link": "https://www.goodreads.com/book/show/40121378-atomic-habits",
"url": "https://www.goodreads.com/book/show/40121378-atomic-habits"
}

Use Cases

  • Book Databases – Build and maintain a structured catalog with Goodreads metadata.
  • Recommendation Engines – Power book recommendation systems using ratings and descriptions.
  • Publishing Research – Analyze Goodreads ratings and reader trends across genres.
  • E-commerce Enrichment – Enrich product listings with Goodreads descriptions and cover images.
  • Academic Research – Collect structured Goodreads data for literature or data science projects.
  • Content Aggregation – Aggregate Goodreads book data for blogs, apps, or reading platforms.

Quick Start

  1. Open on Apify – Visit the actor page and click Try for free.
  2. Set Input – Paste your Goodreads book page URLs into the urls field.
  3. Enable Proxy (Optional) – Configure proxy groups to avoid rate limiting.
  4. Run the Actor – Start the run and monitor progress in the logs.
  5. Download Results – Export the dataset as JSON, CSV, or Excel once finished.

Technical Stack

  • Data SourceGoodreads (HTML scraping)
  • HTML ParserBeautifulSoup with lxml backend
  • HTTP Clientrequests with browser-like headers and optional proxy support
  • Proxy – Apify Proxy (residential or datacenter)
  • Platform – Apify Actor — serverless, scalable, integrated with Dataset and Key-Value Store

ActorDescription
Book Metadata ScraperExtracts rich book metadata from the Open Library database.
Amazon Book ScraperScrapes book listings, prices, and reviews from Amazon.
Google Books ScraperFetches book metadata and previews via the Google Books API.
ISBN Lookup ToolLooks up detailed book info by ISBN from multiple data sources.
Book Price ComparatorCompares book prices across major online retailers.

Changelog

v1.0.0 – Initial Release

  • Direct HTML scraping of Goodreads book pages
  • Title, author, rating, description, and cover image extraction
  • Proxy configuration support
  • Anti-blocking random delays
  • Dataset integration with error handling

Pricing

  • Free for basic usage on Apify (up to certain compute limits).
  • Paid plans available for higher volume, priority support, and longer runs.
  • Proxy credits consumed if residential proxies are enabled.

Support & Feedback

  • Issues & Ideas – Open a ticket on the Apify Actor issue tracker.
  • Documentation – Visit Apify Docs for platform guides.
  • Scraping Notes – Use proxies and keep request rates low to avoid blocks from Goodreads.

Disclaimer: This actor scrapes publicly visible data from Goodreads. Please ensure your usage complies with Goodreads' terms of service. This actor is intended for research and informational purposes only.