Penguin Random House Publisher Catalog Scraper avatar

Penguin Random House Publisher Catalog Scraper

Pricing

Pay per event

Go to Apify Store
Penguin Random House Publisher Catalog Scraper

Penguin Random House Publisher Catalog Scraper

Scrape the official Penguin Random House publisher catalog for book metadata: title, author, ISBN, imprint, format, publication date, price, description, praise blurbs, and more. Search-seeded crawl into detail pages returns primary-source data not available from consumer aggregators.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Categories

Share

Scrape the official Penguin Random House publisher catalog from penguinrandomhouse.com. Extracts authoritative book metadata: title, author, ISBN, imprint, format, publication date, price, description, praise blurbs, and series information — primary-source data not available from consumer review aggregators.

What data does it collect?

Each record is one book edition (one canonical detail page):

FieldTypeDescription
prh_idstringPenguin Random House work ID (numeric, from URL)
titlestringBook title
subtitlestringSubtitle, if present
authorstringPrimary author name
contributorsstringAll contributors as JSON array: [{"name":"...", "role":"..."}]
imprintstringPublisher imprint (e.g. Random House, Crown, Dial Press)
formatstringFormat: Hardcover, Paperback, Ebook, or Audiobook
isbnstringPrimary ISBN-13
pagesintegerPage count
publication_datestringPublication date (ISO 8601, e.g. 2024-10-01)
pricenumberList price in USD
categorystringGenre/category as JSON array of strings
descriptionstringPublisher description (about the book)
about_the_authorstringAuthor biography from the publisher
praisestringPraise/endorsement blurbs as JSON array of strings
seriesstringSeries name if the book is part of a series
related_titlesstringRelated edition ISBNs as JSON array
cover_urlstringCover image URL
product_urlstringFull URL of the book detail page

How to use it

Search by keyword

Provide one or more search queries. The scraper paginates through search results, visits each book detail page, and saves the metadata. Queries can be genre names, author names, topics, or any other search terms the PRH catalog supports.

{
"queries": ["mystery", "science fiction"],
"maxItems": 50,
"sp_intended_usage": "catalog research"
}

Small focused run

{
"queries": ["romance"],
"maxItems": 10,
"sp_intended_usage": "spot check"
}

Input parameters

ParameterTypeRequiredDefaultDescription
queriesarrayYes["fiction"]Search terms to scrape. Each query seeds an independent paginated search.
maxItemsintegerYes5Maximum total book records to collect across all queries.

Notes

  • Extraction uses the structured JSON-LD Book schema embedded on each detail page — the same data the publisher uses for SEO. This gives authoritative isbn, publisherImprint, datePublished, and offers.price without scraping fragile HTML.
  • Praise blurbs, author bios, and categories are extracted from the HTML where JSON-LD does not carry them.
  • The contributors, category, praise, and related_titles fields are serialised as JSON strings so they remain compatible with spreadsheet and CSV exports.
  • The Penguin Random House catalog covers ~120k+ titles across all imprints (Random House, Crown, Knopf, Bantam, Viking, Penguin, and many more).
  • No proxy required — the catalog is publicly accessible without bot protection.