Book & Product Metadata Scraper Pro: Amazon, GBooks, OpenLib avatar

Book & Product Metadata Scraper Pro: Amazon, GBooks, OpenLib

Pricing

$10.99/month + usage

Go to Apify Store
Book & Product Metadata Scraper Pro: Amazon, GBooks, OpenLib

Book & Product Metadata Scraper Pro: Amazon, GBooks, OpenLib

Scrape complete book data from Amazon, Google Books, Open Library and WorldCat. Accepts ISBN, ASIN, Amazon URL or keyword. Returns price, rating, reviews, description, cover image and all metadata. Exports CSV and Excel

Pricing

$10.99/month + usage

Rating

0.0

(0)

Developer

Scrape Pilot

Scrape Pilot

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

6 days ago

Last modified

Share

πŸ“š Book & Product Metadata Scraper Pro β€” Amazon Scrape, Google Books & Open Library

The most complete book and product metadata scraper on Apify. Amazon scrape search results and product pages, extract ISBNs, prices, ratings, descriptions, cover images, and more β€” from Amazon, Google Books, and Open Library simultaneously. No login required.

Apify Actor Amazon Scrape Google Books Open Library


πŸ“Œ Table of Contents


πŸ” What Is This Actor?

Book & Product Metadata Scraper Pro is a production-grade Apify actor that performs a full Amazon scrape of book and product listings, while simultaneously pulling enriched metadata from Google Books and Open Library β€” giving you the most complete book data record possible from a single run.

It accepts any combination of inputs β€” Amazon product URLs, Amazon search URLs, ASINs, ISBNs, author names, book titles, or keyword queries β€” and returns unified, clean, structured records containing everything from pricing and ratings to ISBN numbers, page counts, cover images, and genre categories.

Whether you need to Amazon scrape a specific product page for its price and rating, extract all results from an Amazon search for a keyword, or enrich a list of ISBNs with full metadata from multiple libraries β€” this actor handles all of it through one clean, unified interface with CSV, XLSX, and JSON export.


πŸš€ Why Use This Scraper?

FeatureThis ActorAmazon APIGoogle Books APIManual Research
Amazon scrape (no API key)βœ…βŒ Requires approval❌ N/A⚠️ Manual
ISBN β†’ full metadataβœ…βŒβœ… Limited⚠️ Manual
Multi-source enrichmentβœ… Auto❌❌❌
Cover image (high-res)βœ…βŒβœ… Limited❌
CSV / XLSX exportβœ… Built-in❌❌⚠️ Manual
Bulk ISBN list processingβœ…βŒβš οΈ Quota❌
Author searchβœ…βŒβœ…βš οΈ Manual
Price + rating dataβœ… Amazon scrape❌❌⚠️ Manual
Residential proxyβœ… Built-in❌ N/A❌ N/A❌
No login requiredβœ…βŒβŒβœ…

Bottom line: If you need rich, structured book or product metadata at scale β€” with live Amazon pricing, ISBNs, ratings, descriptions, and cover images β€” this is the only actor that does it all in one run.


🎯 Use Cases

πŸ“¦ eCommerce & Price Monitoring

  • Amazon scrape competitor product listings to monitor price changes daily
  • Extract rating and review counts across hundreds of product ASINs for marketplace analysis
  • Build a price comparison tool that pulls live Amazon pricing alongside full book metadata

πŸ“š Library & Publishing Industry

  • Convert a list of ISBNs into fully enriched book records with publisher, page count, genre, and cover image
  • Amazon scrape bestseller lists to track which titles are trending and at what price points
  • Aggregate book data from Google Books and Open Library for catalog management systems

πŸŽ“ Academic & Research

  • Build research datasets of book metadata for studies on publishing trends, author output, or genre classification
  • Extract structured metadata for bibliographic reference management tools
  • Scrape Amazon search results for specific academic subjects to compile reading lists with ratings

πŸ›’ Bookstore & Marketplace Sellers

  • Bulk process ISBNs to auto-populate product listings with titles, descriptions, authors, and cover images
  • Amazon scrape search results to identify pricing gaps and underserved niches in the book market
  • Track rating changes and review velocity for books in your catalog

πŸ€– AI & Machine Learning

  • Build NLP training datasets using book descriptions, genres, and metadata from thousands of titles
  • Create recommendation system training data with ratings, page counts, and category labels
  • Extract cover image URLs for computer vision classification datasets

πŸ“° Journalism & Content Creation

  • Amazon scrape author pages and book listings for fact-checking and publishing industry reporting
  • Extract book metadata for automated content pipelines generating book review articles
  • Monitor new releases across categories by scraping Amazon search results weekly

πŸ”— Supported Input Types

This actor accepts seven distinct input types in the same book_inputs field, mixed freely on separate lines:

1. Amazon Search URL

Scrape all product cards from an Amazon search results page.

https://www.amazon.com/s?k=python+programming+books
https://www.amazon.com/s?k=machine+learning&i=stripbooks
https://www.amazon.com/s?field-keywords=data+science

2. Amazon Product URL (ASIN)

Amazon scrape a single product detail page for full metadata.

https://www.amazon.com/dp/B08N5WRWNW
https://www.amazon.com/gp/product/0134685997
https://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882

3. Bare ASIN

Just the 10-character Amazon product ID β€” the actor builds the URL automatically.

B08N5WRWNW
0134685997
0132350882

4. ISBN-13 or ISBN-10

International Standard Book Numbers, with or without hyphens.

978-0-13-468599-1
9780134685991
0132350882

Prefix with author: to search all sources for a specific author's works.

author:Robert C. Martin
author:Yuval Noah Harari
author:Malcolm Gladwell

Prefix with title: to search by exact or partial book title.

title:Clean Code
title:Sapiens: A Brief History
title:The Pragmatic Programmer

7. Keyword Query

Any free-text query β€” searched across Google Books and Open Library.

best python books for beginners 2024
machine learning textbook
history of ancient rome

Tip: Mix and match input types freely. Enter one per line. The actor auto-detects each type and routes it to the correct data source.


🌐 Data Sources

πŸ›’ Amazon

The actor performs a live Amazon scrape using curl_cffi with browser fingerprint impersonation (Chrome 110) and residential proxy support. No Amazon API key is required.

  • Product pages: Extracts title, author/brand, price, rating, review count, cover image, publisher, publication year, page count, language, ISBN-13, ISBN-10, genre breadcrumb, description, and ASIN.
  • Search results pages: Extracts all product cards visible on the page β€” title, ASIN, price, rating, review count, cover thumbnail, author/brand, and direct product link.
  • Fallback variants: If a product URL returns 404, the actor automatically tries 4 alternative URL formats before giving up.

πŸ“— Google Books

Free API β€” no key required for basic usage. Returns rich bibliographic metadata including descriptions, categories, language, page count, and cover images at multiple resolutions.

πŸ“– Open Library

Open-access library catalog from the Internet Archive. Excellent source for older and academic titles, Goodreads links, and ISBN cross-references.


πŸ“‹ Output Fields (Full Reference)

Every record returned by the scraper contains the following fields:

πŸ“ Core Bibliographic Fields

FieldTypeDescriptionExample
titlestringFull book/product title (max 300 chars)"Clean Code: A Handbook of Agile Software Craftsmanship"
authorstringAuthor(s) or brand (semicolon-separated, max 3)"Robert C. Martin"
isbn_13stringISBN-13 (13-digit, no hyphens)"9780132350884"
isbn_10stringISBN-10 (10-digit)"0132350882"
asinstringAmazon Standard Identification Number"0132350882"
publisherstringPublisher name"Pearson Education"
yearstringPublication year"2008"
pagesintegerPage count431
languagestringFull language name"English"
genrestringGenre/category (semicolon-separated)"Computers; Programming; Software Engineering"

πŸ’° Commerce & Engagement Fields

FieldTypeDescriptionExample
pricestringListed price (Amazon scrape)"$34.99"
ratingstringAverage rating out of 5"4.7/5"
reviewsstringReview/rating count"5,432 reviews"

πŸ–ΌοΈ Media Fields

FieldTypeDescriptionExample
cover_urlstringHighest-resolution cover image URL"https://images-na.ssl-images-amazon.com/..."
descriptionstringBook description/synopsis (max 800–1000 chars)"Even bad code can function..."

πŸ”— URL Fields

FieldTypeDescriptionExample
amazon_urlstringDirect Amazon product or search URL"https://www.amazon.com/dp/0132350882"
goodreads_urlstringGoodreads book page URL"https://www.goodreads.com/book/show/3735293"
google_books_urlstringGoogle Books page URL"https://books.google.com/books?id=..."
openlibrary_urlstringOpen Library book page URL"https://openlibrary.org/isbn/9780132350884"

πŸ”§ Meta Fields

FieldTypeDescriptionExample
sourcestringData source for this record"Amazon", "Google Books", "Open Library", "Amazon Search"
fetched_atstringISO timestamp of scrape"2024-11-01T10:30:00Z"

βš™οΈ Input Parameters

{
"book_inputs": "9780132350884\nauthor:Robert C. Martin\nhttps://www.amazon.com/s?k=clean+code",
"max_items": 10,
"use_amazon": true,
"use_google_books": true,
"use_openlibrary": true,
"export_csv": true,
"export_xlsx": true,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}
ParameterTypeDefaultDescription
book_inputsstringrequiredOne input per line: ISBN, ASIN, Amazon URL, author:Name, title:Name, or keyword
max_itemsinteger10Maximum results to return per input line
use_amazonbooleantrueEnable Amazon scrape (product pages and search results)
use_google_booksbooleantrueEnable Google Books API as a data source
use_openlibrarybooleantrueEnable Open Library API as a data source
export_csvbooleantrueExport results as results.csv in Key-Value Store
export_xlsxbooleantrueExport results as results.xlsx in Key-Value Store
proxyConfigurationobjectResidentialApify proxy config. Recommended for Amazon scrape.

πŸ“¦ Example Inputs & Outputs

Example 1: Amazon Scrape β€” Product Page by ASIN

Input:

B08N5WRWNW

Output:

{
"title": "Designing Data-Intensive Applications",
"author": "Martin Kleppmann",
"isbn_13": "9781449373320",
"isbn_10": "1449373321",
"asin": "B08N5WRWNW",
"publisher": "O'Reilly Media",
"year": "2017",
"pages": 616,
"language": "English",
"genre": "Books > Computers & Technology > Databases & Big Data",
"rating": "4.7/5",
"reviews": "2,841 reviews",
"price": "$54.99",
"description": "Data is at the center of many challenges in system design today...",
"cover_url": "https://images-na.ssl-images-amazon.com/images/I/...",
"amazon_url": "https://www.amazon.com/dp/B08N5WRWNW",
"goodreads_url": "https://www.goodreads.com/book/show/23463279",
"google_books_url": "https://books.google.com/books?id=...",
"openlibrary_url": "https://openlibrary.org/isbn/9781449373320",
"source": "Amazon",
"fetched_at": "2024-11-01T10:30:00Z"
}

Example 2: Amazon Scrape β€” Search Results Page

Input:

https://www.amazon.com/s?k=python+for+beginners

Output (2 of 10 results shown):

[
{
"title": "Python Crash Course, 3rd Edition",
"author": "Eric Matthes",
"asin": "1718502702",
"price": "$24.49",
"rating": "4.7/5",
"reviews": "8,920 reviews",
"cover_url": "https://images-na.ssl-images-amazon.com/images/I/...",
"amazon_url": "https://www.amazon.com/dp/1718502702",
"source": "Amazon Search",
"fetched_at": "2024-11-01T10:31:00Z"
},
{
"title": "Automate the Boring Stuff with Python, 2nd Edition",
"author": "Al Sweigart",
"asin": "1593279922",
"price": "$29.99",
"rating": "4.7/5",
"reviews": "6,103 reviews",
"source": "Amazon Search"
}
]

Example 3: Bulk ISBN List

Input:

9780132350884
9780201633610
9780596517748
9781491950357

Output: 4 fully enriched records, each combining data from Amazon, Google Books, and Open Library β€” with ISBNs, descriptions, ratings, cover images, and cross-source URLs.


Input:

author:Malcolm Gladwell

Output: Up to max_items books by Malcolm Gladwell from Google Books and Open Library, with titles, ISBNs, publishers, page counts, and cover images.


Example 5: Mixed Batch

Input (one per line):

9780134685991
author:Yuval Noah Harari
https://www.amazon.com/dp/0735224137
title:Atomic Habits
B09H3BXKR5
python machine learning books

Output: Each line is processed independently, auto-detected, routed to the correct source(s), and results are merged into a single unified dataset.


πŸ›’ Amazon Scrape β€” How It Works

Product Page Scraping (ASIN / Product URL)

When an ASIN or product URL is provided, the actor performs a live Amazon scrape using the following process:

Step 1 β€” URL Variant Generation The actor generates up to 4 URL variants for every product to handle 404s and regional redirects:

https://www.amazon.com/dp/{ASIN}
https://www.amazon.com/gp/product/{ASIN}
https://www.amazon.com/dp/{ASIN}?th=1&psc=1
https://www.amazon.com/s?k={ASIN} ← last resort

Step 2 β€” Browser Fingerprint Request The actor uses curl_cffi to impersonate a real Chrome 110 browser, sending authentic headers including Sec-Fetch-*, Accept-Encoding, DNT, Upgrade-Insecure-Requests, and a real user agent. This minimizes bot detection during the Amazon scrape.

Step 3 β€” CAPTCHA Detection If Amazon returns a CAPTCHA page, the actor detects it immediately, skips that variant, and tries the next URL. If all variants are blocked, it falls back to Google Books or Open Library automatically.

Step 4 β€” HTML Parsing Using BeautifulSoup, the actor extracts data from multiple CSS selector patterns per field β€” ensuring compatibility with Amazon's A/B layout variants and regional page structures.

Step 5 β€” Multi-Source Enrichment After a successful Amazon scrape, the actor automatically queries Google Books and Open Library to fill in any missing fields (description, genre, Goodreads link, etc.) and merges results into one complete record.


Search Results Page Scraping

When an Amazon search URL is provided, the actor scrapes all product cards from the page:

  • Tries 4 different card selector patterns for compatibility with Amazon's evolving layout
  • Extracts ASIN, title, price, rating, review count, cover thumbnail, and author/brand per card
  • Returns up to max_items products from the page
  • Falls back to Google Books keyword search if Amazon returns a CAPTCHA

πŸ“— Google Books Integration

The Google Books API provides the richest bibliographic metadata for published books. The actor uses it as:

  • Primary source for ISBN, author, title, and keyword queries
  • Enrichment source for Amazon scrape results missing descriptions or genre data
  • Fallback source when Amazon scrape fails or returns a CAPTCHA

What Google Books adds: full book descriptions (up to 800 chars), category/genre tags, page count, language, high-resolution cover images (zoom=3), publisher, publication date, rating, ratings count, and a direct Google Books URL.

Rate limit handling: If Google Books returns a 429, the actor waits 3 seconds and retries β€” up to 3 times per request.


πŸ“– Open Library Integration

Open Library (Internet Archive) is an excellent secondary source, especially for older and out-of-print titles, academic books, and non-English publications.

What Open Library adds: Goodreads URL (when available), Open Library direct page URL, subject tags from the library catalog, alternative ISBN variants (ISBN-10 ↔ ISBN-13 cross-reference), first publication year, and language data.


πŸ”„ Multi-Source Enrichment

The actor uses a smart merge strategy to combine data from multiple sources into the most complete possible record:

Priority order for each field:
Amazon (primary) β†’ Google Books (fill gaps) β†’ Open Library (fill remaining gaps)
FieldAmazon ScrapeGoogle BooksOpen Library
Priceβœ… Always❌❌
Rating / Reviewsβœ… Liveβœ… Historical❌
Descriptionβœ… (if listed)βœ… Rich❌
Genre / Categoryβœ… Breadcrumbβœ… Google tagsβœ… Library subjects
ISBN-13 / ISBN-10βœ… From detailβœ…βœ…
Goodreads URLβœ… (if linked)βŒβœ…
Cover Imageβœ… High-resβœ… zoom=3βœ… OpenLibrary CDN
Page Countβœ…βœ…βœ… Median

πŸ“ Export Formats

All results are automatically exported to the actor's Key-Value Store in three formats:

JSON (results.json)

Complete structured data with all fields. Ideal for API integration, databases, and further processing.

CSV (results.csv)

UTF-8 with BOM (Excel-compatible) flat table. All 21 output fields as columns. Ideal for spreadsheet analysis in Excel, Google Sheets, or pandas.

XLSX (results.xlsx)

Excel workbook with auto-sized column widths and a frozen header row. Ready to open directly in Microsoft Excel or Google Sheets.

To disable exports, set export_csv: false or export_xlsx: false in the input to save storage and run time.


🌐 Proxy Configuration

For maximum reliability when performing an Amazon scrape, especially at high volume, a residential proxy is strongly recommended.

{
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Why Residential Proxy for Amazon Scrape?

Amazon aggressively blocks datacenter IP ranges. Residential IPs appear as normal household traffic, dramatically reducing CAPTCHA frequency. The actor automatically refreshes the proxy URL before each Amazon request.

No Proxy

For small runs (under 5–10 products), curl_cffi's browser impersonation alone is often sufficient. However, residential proxy use is strongly recommended for reliable production-scale Amazon scraping.


⚑ Performance & Rate Limits

Speed Benchmarks (with residential proxy)

Input TypeItemsEstimated Time
Single ISBN (Google Books)1~3–5 seconds
Single Amazon scrape (product)1~8–15 seconds
Amazon search results10 products~30–60 seconds
Bulk ISBNs (Google Books + OL)20~1–2 minutes
Amazon scrape + enrichment20~3–5 minutes
Mixed batch (all types)50~8–12 minutes

Rate Limiting Strategy

  • Amazon: 0.5–1.5s random delay between requests; 3 retries with proxy refresh; CAPTCHA detection with automatic URL variant fallback
  • Google Books: 0.2–0.5s delay; 3s wait on 429; up to 3 retries per request
  • Open Library: 0.2–0.5s delay; standard retry on failure

Deduplication

Results are automatically deduplicated across all sources using ASIN + source, ISBN-13 + source, or Title + Author + source as the deduplication key.


❓ FAQ

Q: Do I need an Amazon account or API key? A: No. This actor performs a direct Amazon scrape without any authentication, account, or Amazon API key.

Q: Do I need a Google API key? A: No. The Google Books API used here works without an API key for normal usage volumes.

Q: Why does the Amazon scrape sometimes fail for certain products? A: Amazon actively fights automated access. The actor tries up to 4 URL variants and automatically falls back to Google Books / Open Library when Amazon is unavailable.

Q: Can I scrape Amazon.co.uk or Amazon.de? A: The current version targets Amazon.com. International Amazon domains may partially work but are not officially supported.

Q: What if my ISBN returns no results on Amazon? A: The actor will automatically query Google Books and Open Library using the ISBN, which have excellent global coverage for published books.

Q: How many ISBNs can I process in one run? A: There is no hard limit. For large batches (100+ ISBNs), set use_amazon: false to rely on Google Books + Open Library for faster, CAPTCHA-free processing.

Q: Are prices live or cached? A: Prices are scraped live from Amazon in real time on every run. There is no caching.

Q: Does this work for non-book Amazon products? A: Yes. Any Amazon product URL or ASIN can be scraped. Book-specific fields like ISBN and page count will simply be empty for non-book items.

Q: Why does the description sometimes come from Google Books instead of Amazon? A: Multi-source enrichment fills gaps automatically β€” if Amazon's product page doesn't include a description, Google Books provides it without any extra configuration.

Q: What does export_xlsx: true do? A: It creates a formatted Excel file (results.xlsx) in your Apify Key-Value Store with auto-sized columns and a frozen header row β€” ready to open directly in Excel or Google Sheets.


πŸ“œ Changelog

v2.0.0 (Current)

  • βœ… Amazon search URL scraping β€” extract all product cards from /s?k= pages
  • βœ… Amazon product page β€” 5-variant URL fallback system to handle 404s
  • βœ… ASIN 404 fallback β†’ Open Library (not Google Books) for better book coverage
  • βœ… Google Books 429 rate limit β†’ 2–3s retry gap (replaced long 15/30/45s waits)
  • βœ… Any Amazon URL format correctly auto-detected and routed
  • βœ… Keyword input β†’ Google Books + Open Library (no unnecessary Amazon call)
  • βœ… Smart deduplication across all sources (ASIN / ISBN / Title+Author keyed)
  • βœ… XLSX export with auto-sized columns and frozen header row
  • βœ… CSV export with UTF-8 BOM for full Excel compatibility
  • βœ… author: and title: prefix search modes added
  • βœ… Zero crashes β€” all errors handled gracefully with partial records

v1.5.0

  • βœ… Multi-source enrichment merge strategy
  • βœ… Open Library Goodreads ID extraction
  • βœ… Google Books cover image at zoom=3 resolution
  • βœ… ISBN-10 ↔ ISBN-13 cross-referencing

v1.0.0

  • βœ… Initial release: ISBN lookup via Google Books
  • βœ… Basic Amazon product page scraping
  • βœ… JSON dataset export

This actor collects publicly accessible data in the same manner as a regular user browsing these websites.

  • This tool is intended for research, price monitoring, catalog management, and educational use
  • Amazon scrape activity should comply with Amazon's Conditions of Use
  • Google Books data is subject to Google's Terms of Service for the Books API
  • Open Library data is provided under open licenses (Creative Commons) β€” please attribute accordingly
  • Do not use this tool to scrape private, account-specific, or paywalled content

Fair Use: Metadata aggregation (titles, authors, ISBNs, descriptions) for research, bibliographic tools, and catalog management is generally considered fair use. Always consult a legal professional for your specific use case.


🀝 Support & Feedback

  • Bug report? Open a GitHub issue or contact us via the Apify actor page
  • Feature request? Suggest it in the Apify Community forum
  • Rating: If this actor saved you time, please leave a ⭐ review on the Apify Store!

Built with ❀️ on Apify · Amazon Scrape + Google Books + Open Library
The most complete book metadata scraper on the Apify platform β€” no API keys, no login, no limits