Pricing

Pay per event

Penguin Random House Publisher Catalog Scraper

Scrape the official Penguin Random House publisher catalog for book metadata: title, author, ISBN, imprint, format, publication date, price, description, praise blurbs, and more. Search-seeded crawl into detail pages returns primary-source data not available from consumer aggregators.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

Actor stats

Bookmarked

Total users

Monthly active users

23 days ago

Last modified

What data does it collect?

Each record is one book edition (one canonical detail page):

Field	Type	Description
`prh_id`	string	Penguin Random House work ID (numeric, from URL)
`title`	string	Book title
`subtitle`	string	Subtitle, if present
`author`	string	Primary author name
`contributors`	string	All contributors as JSON array: `[{"name":"...", "role":"..."}]`
`imprint`	string	Publisher imprint (e.g. Random House, Crown, Dial Press)
`format`	string	Format: Hardcover, Paperback, Ebook, or Audiobook
`isbn`	string	Primary ISBN-13
`pages`	integer	Page count
`publication_date`	string	Publication date (ISO 8601, e.g. `2024-10-01`)
`price`	number	List price in USD
`category`	string	Genre/category as JSON array of strings
`description`	string	Publisher description (about the book)
`about_the_author`	string	Author biography from the publisher
`praise`	string	Praise/endorsement blurbs as JSON array of strings
`series`	string	Series name if the book is part of a series
`related_titles`	string	Related edition ISBNs as JSON array
`cover_url`	string	Cover image URL
`product_url`	string	Full URL of the book detail page

How to use it

Search by keyword

Provide one or more search queries. The scraper paginates through search results, visits each book detail page, and saves the metadata. Queries can be genre names, author names, topics, or any other search terms the PRH catalog supports.

{
  "queries": ["mystery", "science fiction"],
  "maxItems": 50,
  "sp_intended_usage": "catalog research"
}

Small focused run

{
  "queries": ["romance"],
  "maxItems": 10,
  "sp_intended_usage": "spot check"
}

Input parameters

Parameter	Type	Required	Default	Description
`queries`	array	Yes	`["fiction"]`	Search terms to scrape. Each query seeds an independent paginated search.
`maxItems`	integer	Yes	5	Maximum total book records to collect across all queries.

Notes

Extraction uses the structured JSON-LD Book schema embedded on each detail page — the same data the publisher uses for SEO. This gives authoritative isbn, publisherImprint, datePublished, and offers.price without scraping fragile HTML.
Praise blurbs, author bios, and categories are extracted from the HTML where JSON-LD does not carry them.
The contributors, category, praise, and related_titles fields are serialised as JSON strings so they remain compatible with spreadsheet and CSV exports.
The Penguin Random House catalog covers ~120k+ titles across all imprints (Random House, Crown, Knopf, Bantam, Viking, Penguin, and many more).
No proxy required — the catalog is publicly accessible without bot protection.

Random User API

vivid_astronaut/random-user

Fabio Suizu

Book Metadata Scraper

datapilot/book-metadata-scraper

Book Metadata Scraper uses the Open Library API to collect detailed book data by query. It extracts title, author, ISBN, publisher, publish year, pages, categories, ratings, description, cover image, and preview link. Outputs structured JSON for catalogs, apps, and research use.

Data Pilot

Reddit: Random Subreddit Actor

pintostudio/reddit-random-subreddit-actor

This Reddit: Random Subreddit Actor retrieves a paginated list of random public subreddits with rich metadata such as subscriber count, description, NSFW status, creation date, and more.

Pinto Studio

Book Data Scraper: ISBN & Title Lookup

scrapemint/book-data-scraper

Look up books in bulk by ISBN or by title and author. For each book: title, author, first publish year, pages, publisher, subjects, languages, reader rating and cover image URL. Perfect for book sellers, store listings and reading apps. No API key needed.

Ken M

Google Books Scraper — Search, Catalog & ISBN Lookup

logiover/google-books-scraper

Scrape Google Books by keyword, author, subject, or ISBN. Extract title, authors, publisher, description, ratings, categories, pricing, and 18+ fields. No API key, no login required.

Logiover

Amazon book scraper

datapilot/amazon-book-scraper

Amazon Book Scraper uses residential proxies to extract book details from Amazon product pages. It collects title, author, price, rating, reviews, ASIN, publisher, publication date, pages, language, description, and image. Outputs structured JSON for e-commerce analysis and research.

Data Pilot

3.0

Goodreads Book Scraper - Metadata, Ratings & Reviews

klondikeking/goodreads-book-scraper

Extract book metadata, ratings, reviews, and author information from Goodreads. Get structured data including title, author, ISBN, rating, review count, description, and cover image. Ideal for book market research, catalog building, and literary analytics.

Pierrick McD0nald

Goodreads Scraper

jungle_synthesizer/goodreads-scraper

Extract book data from Goodreads: titles, authors, ratings, reviews, genres, ISBN, pages, format, publication date, awards, and more. Accepts book or author URLs as input.

BowTiedRaccoon

Random Dog Image Scraper

rexbuck2000/random-dog-image-scraper

Random Dog Image Scraper

Julie Hockema

Goodreads Book Scraper

sian.agency/goodreads-book-scraper

Scrape books from Goodreads by search or book page — title, author, average rating, ratings & reviews count, ISBN/ISBN13, ASIN, pages, publisher, language, genres, series and description. Clean JSON/CSV, no code.