Amazon book scraper
Pricing
$8.00/month + usage
Amazon book scraper
Amazon Book Scraper uses residential proxies to extract book details from Amazon product pages. It collects title, author, price, rating, reviews, ASIN, publisher, publication date, pages, language, description, and image. Outputs structured JSON for e-commerce analysis and research.
Pricing
$8.00/month + usage
Rating
0.0
(0)
Developer
Data Pilot
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Overview
The Amazon Book Scraper is an Apify Actor that extracts detailed book metadata directly from Amazon product pages using browser automation. Provide one or more Amazon book URLs and the actor returns structured data including title, author, price, rating, description, publisher, and cover image. Whether you're building an Amazon book database, monitoring Amazon book prices, or conducting publishing research, this actor delivers accurate Amazon book data efficiently.
With residential proxy support and anti-detection techniques, the Amazon Book Scraper ensures reliable and undetected access to Amazon book pages — even on slow or restricted networks.
Features
- Full Amazon Book Extraction – Scrapes detailed Amazon book data including title, author, price, rating, reviews count, description, publisher, and more.
- Playwright Browser Automation – Uses a real Chromium browser to render JavaScript-heavy Amazon book pages accurately.
- Anti-Detection – Rotates user agents and disables automation fingerprints to avoid Amazon bot detection.
- Retry Logic – Automatically retries up to 3 times with multiple navigation strategies for reliable Amazon book scraping.
- ASIN Extraction – Automatically extracts the Amazon ASIN from the book URL.
- Proxy Support – Uses Apify residential proxies to bypass Amazon IP restrictions.
- Anti-Blocking Delays – Adds random delays between requests to mimic human browsing behavior.
- Error Handling – Logs errors per URL and continues processing remaining Amazon book pages.
- Dataset Integration – Automatically pushes all Amazon book data to your Apify dataset for easy export.
How It Works
- Input – Provide a list of Amazon book page URLs.
- Browser Launch – The actor launches a headless Chromium browser with anti-detection settings.
- Page Navigation – It navigates to each Amazon book URL using multiple fallback strategies (domcontentloaded → load → commit).
- Data Extraction – Extracts Amazon book fields using targeted CSS selectors.
- Build Output – Structures all extracted data into a clean record and pushes it to the dataset.
- Repeat – Processes all URLs with random delays between requests.
Input
| Field | Type | Default | Description |
|---|---|---|---|
urls | String or Array | Required | Amazon book page URLs, one per line or as a JSON array. |
useApifyProxy | Boolean | true | Whether to use Apify proxy for Amazon book scraping. |
apifyProxyGroups | Array of strings | ["RESIDENTIAL"] | Proxy groups to use (e.g., ["RESIDENTIAL"]). |
Example input:
{"urls": ["https://www.amazon.com/dp/0735211299","https://www.amazon.com/dp/0062316117"],"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}
Output
Each Amazon book record is pushed as a separate dataset item with the following fields:
| Field | Type | Description |
|---|---|---|
url | string | Original Amazon book page URL. |
asin | string | Amazon Standard Identification Number (ASIN). |
title | string | Amazon book title. |
author | string | Amazon book author name. |
rating | string | Average Amazon book rating (e.g., "4.7 out of 5 stars"). |
reviews_count | string | Total number of Amazon book customer reviews. |
price | string | Amazon book price (e.g., "$14.99"). |
image | string | Direct URL to the Amazon book cover image. |
description | string | Amazon book description/synopsis. |
publisher | string | Amazon book publisher name. |
pub_date | string | Amazon book publication date. |
pages_count | string | Number of pages in the Amazon book. |
language | string | Language of the Amazon book. |
status | string | "success" or "error". |
attempt | integer | Number of attempts made to scrape this Amazon book page. |
scraped_at | string | ISO 8601 UTC timestamp of when the Amazon book was scraped. |
Example output (success):
{"url": "https://www.amazon.com/dp/0735211299","asin": "0735211299","title": "Atomic Habits","author": "James Clear","rating": "4.8 out of 5 stars","reviews_count": "112,000 ratings","price": "$14.99","image": "https://m.media-amazon.com/images/I/513Y5o-DYtL.jpg","description": "No matter your goals, Atomic Habits offers a proven framework...","publisher": "Avery","pub_date": "October 16, 2018","pages_count": "320","language": "English","status": "success","attempt": 1,"scraped_at": "2025-03-22T12:34:56Z"}
Example output (error):
{"url": "https://www.amazon.com/dp/XXXXXXXXXX","status": "error","message": "All navigation strategies timed out","attempt": 3}
Use Cases
- Amazon Book Databases – Build and maintain a structured catalog of Amazon book listings.
- Price Monitoring – Track Amazon book prices over time for deals and trends.
- Publishing Research – Analyze Amazon book ratings, reviews, and publisher data.
- E-commerce Enrichment – Enrich product listings with Amazon book descriptions and cover images.
- Competitor Analysis – Monitor Amazon book rankings and pricing for competitive intelligence.
- Academic Research – Collect structured Amazon book data for literature or data science projects.
- Recommendation Engines – Power Amazon book recommendation systems using ratings and metadata.
Quick Start
- Open on Apify – Visit the actor page and click Try for free.
- Set Input – Paste your Amazon book page URLs into the
urlsfield. - Enable Proxy – Keep
useApifyProxyenabled for reliable Amazon book scraping. - Run the Actor – Start the run and monitor progress in the logs.
- Download Results – Export the Amazon book dataset as JSON, CSV, or Excel once finished.
Technical Stack
- Browser Automation – (Chromium) for JavaScript-rendered Amazon book pages
- Anti-Detection – Random user agents, disabled webdriver fingerprint
- HTTP Navigation – Multi-strategy page loading (domcontentloaded, load, commit)
- Proxy – Apify Proxy (residential) for bypassing Amazon restrictions
- Platform – Apify Actor — serverless, scalable, integrated with Dataset and Key-Value Store
Related Tools
| Actor | Description |
|---|---|
| Goodreads Scraper | Extracts Amazon book ratings, reviews, and metadata from Goodreads. |
| Book Metadata Scraper | Extracts rich Amazon book metadata from the Open Library database. |
| Google Books Scraper | Fetches Amazon book metadata and previews via the Google Books API. |
| ISBN Lookup Tool | Looks up detailed Amazon book info by ISBN from multiple data sources. |
| Book Price Comparator | Compares Amazon book prices across major online retailers. |
Changelog
v1.0.0 – Initial Release
- -based Amazon book page scraping
- Title, author, price, rating, and reviews extraction
- ASIN extraction from Amazon book URLs
- Publisher, publication date, and page count extraction
- Cover image URL extraction
- Residential proxy configuration support
- Anti-detection user agent rotation
- Retry logic with multiple navigation strategies
- Random anti-blocking delays
- Dataset integration with error handling
Pricing
- Free for basic usage on Apify (up to certain compute limits).
- Paid plans available for higher volume, priority support, and longer runs.
- Proxy credits consumed if residential proxies are enabled.
Support & Feedback
- Issues & Ideas – Open a ticket on the Apify Actor issue tracker.
- Documentation – Visit Apify Docs for platform guides.
- Scraping Notes – Always use residential proxies when scraping Amazon book pages to avoid blocks.
Disclaimer: This actor scrapes publicly visible data from Amazon book pages. Please ensure your usage complies with Amazon's terms of service. This actor is intended for research and informational purposes only.