Book & Product Metadata Scraper Pro: Amazon, GBooks, OpenLib
Pricing
$10.99/month + usage
Book & Product Metadata Scraper Pro: Amazon, GBooks, OpenLib
Scrape complete book data from Amazon, Google Books, Open Library and WorldCat. Accepts ISBN, ASIN, Amazon URL or keyword. Returns price, rating, reviews, description, cover image and all metadata. Exports CSV and Excel
Pricing
$10.99/month + usage
Rating
0.0
(0)
Developer
Scrape Pilot
Actor stats
0
Bookmarked
3
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
π Book & Product Metadata Scraper Pro β Amazon Scrape, Google Books & Open Library
The most complete book and product metadata scraper on Apify. Amazon scrape search results and product pages, extract ISBNs, prices, ratings, descriptions, cover images, and more β from Amazon, Google Books, and Open Library simultaneously. No login required.
π Table of Contents
- What Is This Actor?
- Why Use This Scraper?
- Use Cases
- Supported Input Types
- Data Sources
- Output Fields (Full Reference)
- Input Parameters
- Example Inputs & Outputs
- Amazon Scrape β How It Works
- Google Books Integration
- Open Library Integration
- Multi-Source Enrichment
- Export Formats
- Proxy Configuration
- Performance & Rate Limits
- FAQ
- Changelog
- Legal & Terms of Use
π What Is This Actor?
Book & Product Metadata Scraper Pro is a production-grade Apify actor that performs a full Amazon scrape of book and product listings, while simultaneously pulling enriched metadata from Google Books and Open Library β giving you the most complete book data record possible from a single run.
It accepts any combination of inputs β Amazon product URLs, Amazon search URLs, ASINs, ISBNs, author names, book titles, or keyword queries β and returns unified, clean, structured records containing everything from pricing and ratings to ISBN numbers, page counts, cover images, and genre categories.
Whether you need to Amazon scrape a specific product page for its price and rating, extract all results from an Amazon search for a keyword, or enrich a list of ISBNs with full metadata from multiple libraries β this actor handles all of it through one clean, unified interface with CSV, XLSX, and JSON export.
π Why Use This Scraper?
| Feature | This Actor | Amazon API | Google Books API | Manual Research |
|---|---|---|---|---|
| Amazon scrape (no API key) | β | β Requires approval | β N/A | β οΈ Manual |
| ISBN β full metadata | β | β | β Limited | β οΈ Manual |
| Multi-source enrichment | β Auto | β | β | β |
| Cover image (high-res) | β | β | β Limited | β |
| CSV / XLSX export | β Built-in | β | β | β οΈ Manual |
| Bulk ISBN list processing | β | β | β οΈ Quota | β |
| Author search | β | β | β | β οΈ Manual |
| Price + rating data | β Amazon scrape | β | β | β οΈ Manual |
| Residential proxy | β Built-in | β N/A | β N/A | β |
| No login required | β | β | β | β |
Bottom line: If you need rich, structured book or product metadata at scale β with live Amazon pricing, ISBNs, ratings, descriptions, and cover images β this is the only actor that does it all in one run.
π― Use Cases
π¦ eCommerce & Price Monitoring
- Amazon scrape competitor product listings to monitor price changes daily
- Extract rating and review counts across hundreds of product ASINs for marketplace analysis
- Build a price comparison tool that pulls live Amazon pricing alongside full book metadata
π Library & Publishing Industry
- Convert a list of ISBNs into fully enriched book records with publisher, page count, genre, and cover image
- Amazon scrape bestseller lists to track which titles are trending and at what price points
- Aggregate book data from Google Books and Open Library for catalog management systems
π Academic & Research
- Build research datasets of book metadata for studies on publishing trends, author output, or genre classification
- Extract structured metadata for bibliographic reference management tools
- Scrape Amazon search results for specific academic subjects to compile reading lists with ratings
π Bookstore & Marketplace Sellers
- Bulk process ISBNs to auto-populate product listings with titles, descriptions, authors, and cover images
- Amazon scrape search results to identify pricing gaps and underserved niches in the book market
- Track rating changes and review velocity for books in your catalog
π€ AI & Machine Learning
- Build NLP training datasets using book descriptions, genres, and metadata from thousands of titles
- Create recommendation system training data with ratings, page counts, and category labels
- Extract cover image URLs for computer vision classification datasets
π° Journalism & Content Creation
- Amazon scrape author pages and book listings for fact-checking and publishing industry reporting
- Extract book metadata for automated content pipelines generating book review articles
- Monitor new releases across categories by scraping Amazon search results weekly
π Supported Input Types
This actor accepts seven distinct input types in the same book_inputs field, mixed freely on separate lines:
1. Amazon Search URL
Scrape all product cards from an Amazon search results page.
https://www.amazon.com/s?k=python+programming+bookshttps://www.amazon.com/s?k=machine+learning&i=stripbookshttps://www.amazon.com/s?field-keywords=data+science
2. Amazon Product URL (ASIN)
Amazon scrape a single product detail page for full metadata.
https://www.amazon.com/dp/B08N5WRWNWhttps://www.amazon.com/gp/product/0134685997https://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882
3. Bare ASIN
Just the 10-character Amazon product ID β the actor builds the URL automatically.
B08N5WRWNW01346859970132350882
4. ISBN-13 or ISBN-10
International Standard Book Numbers, with or without hyphens.
978-0-13-468599-197801346859910132350882
5. Author Search
Prefix with author: to search all sources for a specific author's works.
author:Robert C. Martinauthor:Yuval Noah Harariauthor:Malcolm Gladwell
6. Title Search
Prefix with title: to search by exact or partial book title.
title:Clean Codetitle:Sapiens: A Brief Historytitle:The Pragmatic Programmer
7. Keyword Query
Any free-text query β searched across Google Books and Open Library.
best python books for beginners 2024machine learning textbookhistory of ancient rome
Tip: Mix and match input types freely. Enter one per line. The actor auto-detects each type and routes it to the correct data source.
π Data Sources
π Amazon
The actor performs a live Amazon scrape using curl_cffi with browser fingerprint impersonation (Chrome 110) and residential proxy support. No Amazon API key is required.
- Product pages: Extracts title, author/brand, price, rating, review count, cover image, publisher, publication year, page count, language, ISBN-13, ISBN-10, genre breadcrumb, description, and ASIN.
- Search results pages: Extracts all product cards visible on the page β title, ASIN, price, rating, review count, cover thumbnail, author/brand, and direct product link.
- Fallback variants: If a product URL returns 404, the actor automatically tries 4 alternative URL formats before giving up.
π Google Books
Free API β no key required for basic usage. Returns rich bibliographic metadata including descriptions, categories, language, page count, and cover images at multiple resolutions.
π Open Library
Open-access library catalog from the Internet Archive. Excellent source for older and academic titles, Goodreads links, and ISBN cross-references.
π Output Fields (Full Reference)
Every record returned by the scraper contains the following fields:
π Core Bibliographic Fields
| Field | Type | Description | Example |
|---|---|---|---|
title | string | Full book/product title (max 300 chars) | "Clean Code: A Handbook of Agile Software Craftsmanship" |
author | string | Author(s) or brand (semicolon-separated, max 3) | "Robert C. Martin" |
isbn_13 | string | ISBN-13 (13-digit, no hyphens) | "9780132350884" |
isbn_10 | string | ISBN-10 (10-digit) | "0132350882" |
asin | string | Amazon Standard Identification Number | "0132350882" |
publisher | string | Publisher name | "Pearson Education" |
year | string | Publication year | "2008" |
pages | integer | Page count | 431 |
language | string | Full language name | "English" |
genre | string | Genre/category (semicolon-separated) | "Computers; Programming; Software Engineering" |
π° Commerce & Engagement Fields
| Field | Type | Description | Example |
|---|---|---|---|
price | string | Listed price (Amazon scrape) | "$34.99" |
rating | string | Average rating out of 5 | "4.7/5" |
reviews | string | Review/rating count | "5,432 reviews" |
πΌοΈ Media Fields
| Field | Type | Description | Example |
|---|---|---|---|
cover_url | string | Highest-resolution cover image URL | "https://images-na.ssl-images-amazon.com/..." |
description | string | Book description/synopsis (max 800β1000 chars) | "Even bad code can function..." |
π URL Fields
| Field | Type | Description | Example |
|---|---|---|---|
amazon_url | string | Direct Amazon product or search URL | "https://www.amazon.com/dp/0132350882" |
goodreads_url | string | Goodreads book page URL | "https://www.goodreads.com/book/show/3735293" |
google_books_url | string | Google Books page URL | "https://books.google.com/books?id=..." |
openlibrary_url | string | Open Library book page URL | "https://openlibrary.org/isbn/9780132350884" |
π§ Meta Fields
| Field | Type | Description | Example |
|---|---|---|---|
source | string | Data source for this record | "Amazon", "Google Books", "Open Library", "Amazon Search" |
fetched_at | string | ISO timestamp of scrape | "2024-11-01T10:30:00Z" |
βοΈ Input Parameters
{"book_inputs": "9780132350884\nauthor:Robert C. Martin\nhttps://www.amazon.com/s?k=clean+code","max_items": 10,"use_amazon": true,"use_google_books": true,"use_openlibrary": true,"export_csv": true,"export_xlsx": true,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
| Parameter | Type | Default | Description |
|---|---|---|---|
book_inputs | string | required | One input per line: ISBN, ASIN, Amazon URL, author:Name, title:Name, or keyword |
max_items | integer | 10 | Maximum results to return per input line |
use_amazon | boolean | true | Enable Amazon scrape (product pages and search results) |
use_google_books | boolean | true | Enable Google Books API as a data source |
use_openlibrary | boolean | true | Enable Open Library API as a data source |
export_csv | boolean | true | Export results as results.csv in Key-Value Store |
export_xlsx | boolean | true | Export results as results.xlsx in Key-Value Store |
proxyConfiguration | object | Residential | Apify proxy config. Recommended for Amazon scrape. |
π¦ Example Inputs & Outputs
Example 1: Amazon Scrape β Product Page by ASIN
Input:
B08N5WRWNW
Output:
{"title": "Designing Data-Intensive Applications","author": "Martin Kleppmann","isbn_13": "9781449373320","isbn_10": "1449373321","asin": "B08N5WRWNW","publisher": "O'Reilly Media","year": "2017","pages": 616,"language": "English","genre": "Books > Computers & Technology > Databases & Big Data","rating": "4.7/5","reviews": "2,841 reviews","price": "$54.99","description": "Data is at the center of many challenges in system design today...","cover_url": "https://images-na.ssl-images-amazon.com/images/I/...","amazon_url": "https://www.amazon.com/dp/B08N5WRWNW","goodreads_url": "https://www.goodreads.com/book/show/23463279","google_books_url": "https://books.google.com/books?id=...","openlibrary_url": "https://openlibrary.org/isbn/9781449373320","source": "Amazon","fetched_at": "2024-11-01T10:30:00Z"}
Example 2: Amazon Scrape β Search Results Page
Input:
https://www.amazon.com/s?k=python+for+beginners
Output (2 of 10 results shown):
[{"title": "Python Crash Course, 3rd Edition","author": "Eric Matthes","asin": "1718502702","price": "$24.49","rating": "4.7/5","reviews": "8,920 reviews","cover_url": "https://images-na.ssl-images-amazon.com/images/I/...","amazon_url": "https://www.amazon.com/dp/1718502702","source": "Amazon Search","fetched_at": "2024-11-01T10:31:00Z"},{"title": "Automate the Boring Stuff with Python, 2nd Edition","author": "Al Sweigart","asin": "1593279922","price": "$29.99","rating": "4.7/5","reviews": "6,103 reviews","source": "Amazon Search"}]
Example 3: Bulk ISBN List
Input:
9780132350884978020163361097805965177489781491950357
Output: 4 fully enriched records, each combining data from Amazon, Google Books, and Open Library β with ISBNs, descriptions, ratings, cover images, and cross-source URLs.
Example 4: Author Search
Input:
author:Malcolm Gladwell
Output: Up to max_items books by Malcolm Gladwell from Google Books and Open Library, with titles, ISBNs, publishers, page counts, and cover images.
Example 5: Mixed Batch
Input (one per line):
9780134685991author:Yuval Noah Hararihttps://www.amazon.com/dp/0735224137title:Atomic HabitsB09H3BXKR5python machine learning books
Output: Each line is processed independently, auto-detected, routed to the correct source(s), and results are merged into a single unified dataset.
π Amazon Scrape β How It Works
Product Page Scraping (ASIN / Product URL)
When an ASIN or product URL is provided, the actor performs a live Amazon scrape using the following process:
Step 1 β URL Variant Generation The actor generates up to 4 URL variants for every product to handle 404s and regional redirects:
https://www.amazon.com/dp/{ASIN}https://www.amazon.com/gp/product/{ASIN}https://www.amazon.com/dp/{ASIN}?th=1&psc=1https://www.amazon.com/s?k={ASIN} β last resort
Step 2 β Browser Fingerprint Request
The actor uses curl_cffi to impersonate a real Chrome 110 browser, sending authentic headers including Sec-Fetch-*, Accept-Encoding, DNT, Upgrade-Insecure-Requests, and a real user agent. This minimizes bot detection during the Amazon scrape.
Step 3 β CAPTCHA Detection If Amazon returns a CAPTCHA page, the actor detects it immediately, skips that variant, and tries the next URL. If all variants are blocked, it falls back to Google Books or Open Library automatically.
Step 4 β HTML Parsing Using BeautifulSoup, the actor extracts data from multiple CSS selector patterns per field β ensuring compatibility with Amazon's A/B layout variants and regional page structures.
Step 5 β Multi-Source Enrichment After a successful Amazon scrape, the actor automatically queries Google Books and Open Library to fill in any missing fields (description, genre, Goodreads link, etc.) and merges results into one complete record.
Search Results Page Scraping
When an Amazon search URL is provided, the actor scrapes all product cards from the page:
- Tries 4 different card selector patterns for compatibility with Amazon's evolving layout
- Extracts ASIN, title, price, rating, review count, cover thumbnail, and author/brand per card
- Returns up to
max_itemsproducts from the page - Falls back to Google Books keyword search if Amazon returns a CAPTCHA
π Google Books Integration
The Google Books API provides the richest bibliographic metadata for published books. The actor uses it as:
- Primary source for ISBN, author, title, and keyword queries
- Enrichment source for Amazon scrape results missing descriptions or genre data
- Fallback source when Amazon scrape fails or returns a CAPTCHA
What Google Books adds: full book descriptions (up to 800 chars), category/genre tags, page count, language, high-resolution cover images (zoom=3), publisher, publication date, rating, ratings count, and a direct Google Books URL.
Rate limit handling: If Google Books returns a 429, the actor waits 3 seconds and retries β up to 3 times per request.
π Open Library Integration
Open Library (Internet Archive) is an excellent secondary source, especially for older and out-of-print titles, academic books, and non-English publications.
What Open Library adds: Goodreads URL (when available), Open Library direct page URL, subject tags from the library catalog, alternative ISBN variants (ISBN-10 β ISBN-13 cross-reference), first publication year, and language data.
π Multi-Source Enrichment
The actor uses a smart merge strategy to combine data from multiple sources into the most complete possible record:
Priority order for each field:Amazon (primary) β Google Books (fill gaps) β Open Library (fill remaining gaps)
| Field | Amazon Scrape | Google Books | Open Library |
|---|---|---|---|
| Price | β Always | β | β |
| Rating / Reviews | β Live | β Historical | β |
| Description | β (if listed) | β Rich | β |
| Genre / Category | β Breadcrumb | β Google tags | β Library subjects |
| ISBN-13 / ISBN-10 | β From detail | β | β |
| Goodreads URL | β (if linked) | β | β |
| Cover Image | β High-res | β zoom=3 | β OpenLibrary CDN |
| Page Count | β | β | β Median |
π Export Formats
All results are automatically exported to the actor's Key-Value Store in three formats:
JSON (results.json)
Complete structured data with all fields. Ideal for API integration, databases, and further processing.
CSV (results.csv)
UTF-8 with BOM (Excel-compatible) flat table. All 21 output fields as columns. Ideal for spreadsheet analysis in Excel, Google Sheets, or pandas.
XLSX (results.xlsx)
Excel workbook with auto-sized column widths and a frozen header row. Ready to open directly in Microsoft Excel or Google Sheets.
To disable exports, set export_csv: false or export_xlsx: false in the input to save storage and run time.
π Proxy Configuration
For maximum reliability when performing an Amazon scrape, especially at high volume, a residential proxy is strongly recommended.
Recommended Setup
{"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Why Residential Proxy for Amazon Scrape?
Amazon aggressively blocks datacenter IP ranges. Residential IPs appear as normal household traffic, dramatically reducing CAPTCHA frequency. The actor automatically refreshes the proxy URL before each Amazon request.
No Proxy
For small runs (under 5β10 products), curl_cffi's browser impersonation alone is often sufficient. However, residential proxy use is strongly recommended for reliable production-scale Amazon scraping.
β‘ Performance & Rate Limits
Speed Benchmarks (with residential proxy)
| Input Type | Items | Estimated Time |
|---|---|---|
| Single ISBN (Google Books) | 1 | ~3β5 seconds |
| Single Amazon scrape (product) | 1 | ~8β15 seconds |
| Amazon search results | 10 products | ~30β60 seconds |
| Bulk ISBNs (Google Books + OL) | 20 | ~1β2 minutes |
| Amazon scrape + enrichment | 20 | ~3β5 minutes |
| Mixed batch (all types) | 50 | ~8β12 minutes |
Rate Limiting Strategy
- Amazon: 0.5β1.5s random delay between requests; 3 retries with proxy refresh; CAPTCHA detection with automatic URL variant fallback
- Google Books: 0.2β0.5s delay; 3s wait on 429; up to 3 retries per request
- Open Library: 0.2β0.5s delay; standard retry on failure
Deduplication
Results are automatically deduplicated across all sources using ASIN + source, ISBN-13 + source, or Title + Author + source as the deduplication key.
β FAQ
Q: Do I need an Amazon account or API key? A: No. This actor performs a direct Amazon scrape without any authentication, account, or Amazon API key.
Q: Do I need a Google API key? A: No. The Google Books API used here works without an API key for normal usage volumes.
Q: Why does the Amazon scrape sometimes fail for certain products? A: Amazon actively fights automated access. The actor tries up to 4 URL variants and automatically falls back to Google Books / Open Library when Amazon is unavailable.
Q: Can I scrape Amazon.co.uk or Amazon.de? A: The current version targets Amazon.com. International Amazon domains may partially work but are not officially supported.
Q: What if my ISBN returns no results on Amazon? A: The actor will automatically query Google Books and Open Library using the ISBN, which have excellent global coverage for published books.
Q: How many ISBNs can I process in one run?
A: There is no hard limit. For large batches (100+ ISBNs), set use_amazon: false to rely on Google Books + Open Library for faster, CAPTCHA-free processing.
Q: Are prices live or cached? A: Prices are scraped live from Amazon in real time on every run. There is no caching.
Q: Does this work for non-book Amazon products? A: Yes. Any Amazon product URL or ASIN can be scraped. Book-specific fields like ISBN and page count will simply be empty for non-book items.
Q: Why does the description sometimes come from Google Books instead of Amazon? A: Multi-source enrichment fills gaps automatically β if Amazon's product page doesn't include a description, Google Books provides it without any extra configuration.
Q: What does export_xlsx: true do?
A: It creates a formatted Excel file (results.xlsx) in your Apify Key-Value Store with auto-sized columns and a frozen header row β ready to open directly in Excel or Google Sheets.
π Changelog
v2.0.0 (Current)
- β
Amazon search URL scraping β extract all product cards from
/s?k=pages - β Amazon product page β 5-variant URL fallback system to handle 404s
- β ASIN 404 fallback β Open Library (not Google Books) for better book coverage
- β Google Books 429 rate limit β 2β3s retry gap (replaced long 15/30/45s waits)
- β Any Amazon URL format correctly auto-detected and routed
- β Keyword input β Google Books + Open Library (no unnecessary Amazon call)
- β Smart deduplication across all sources (ASIN / ISBN / Title+Author keyed)
- β XLSX export with auto-sized columns and frozen header row
- β CSV export with UTF-8 BOM for full Excel compatibility
- β
author:andtitle:prefix search modes added - β Zero crashes β all errors handled gracefully with partial records
v1.5.0
- β Multi-source enrichment merge strategy
- β Open Library Goodreads ID extraction
- β Google Books cover image at zoom=3 resolution
- β ISBN-10 β ISBN-13 cross-referencing
v1.0.0
- β Initial release: ISBN lookup via Google Books
- β Basic Amazon product page scraping
- β JSON dataset export
βοΈ Legal & Terms of Use
This actor collects publicly accessible data in the same manner as a regular user browsing these websites.
- This tool is intended for research, price monitoring, catalog management, and educational use
- Amazon scrape activity should comply with Amazon's Conditions of Use
- Google Books data is subject to Google's Terms of Service for the Books API
- Open Library data is provided under open licenses (Creative Commons) β please attribute accordingly
- Do not use this tool to scrape private, account-specific, or paywalled content
Fair Use: Metadata aggregation (titles, authors, ISBNs, descriptions) for research, bibliographic tools, and catalog management is generally considered fair use. Always consult a legal professional for your specific use case.
π€ Support & Feedback
- Bug report? Open a GitHub issue or contact us via the Apify actor page
- Feature request? Suggest it in the Apify Community forum
- Rating: If this actor saved you time, please leave a β review on the Apify Store!
Built with β€οΈ on Apify Β· Amazon Scrape + Google Books + Open Library
The most complete book metadata scraper on the Apify platform β no API keys, no login, no limits