Amazon book scraper avatar

Amazon book scraper

Pricing

$8.00/month + usage

Go to Apify Store
Amazon book scraper

Amazon book scraper

Amazon Book Scraper uses residential proxies to extract book details from Amazon product pages. It collects title, author, price, rating, reviews, ASIN, publisher, publication date, pages, language, description, and image. Outputs structured JSON for e-commerce analysis and research.

Pricing

$8.00/month + usage

Rating

0.0

(0)

Developer

Data Pilot

Data Pilot

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Overview

The Amazon Book Scraper is an Apify Actor that extracts detailed book metadata directly from Amazon product pages using browser automation. Provide one or more Amazon book URLs and the actor returns structured data including title, author, price, rating, description, publisher, and cover image. Whether you're building an Amazon book database, monitoring Amazon book prices, or conducting publishing research, this actor delivers accurate Amazon book data efficiently.

With residential proxy support and anti-detection techniques, the Amazon Book Scraper ensures reliable and undetected access to Amazon book pages — even on slow or restricted networks.


Features

  • Full Amazon Book Extraction – Scrapes detailed Amazon book data including title, author, price, rating, reviews count, description, publisher, and more.
  • Playwright Browser Automation – Uses a real Chromium browser to render JavaScript-heavy Amazon book pages accurately.
  • Anti-Detection – Rotates user agents and disables automation fingerprints to avoid Amazon bot detection.
  • Retry Logic – Automatically retries up to 3 times with multiple navigation strategies for reliable Amazon book scraping.
  • ASIN Extraction – Automatically extracts the Amazon ASIN from the book URL.
  • Proxy Support – Uses Apify residential proxies to bypass Amazon IP restrictions.
  • Anti-Blocking Delays – Adds random delays between requests to mimic human browsing behavior.
  • Error Handling – Logs errors per URL and continues processing remaining Amazon book pages.
  • Dataset Integration – Automatically pushes all Amazon book data to your Apify dataset for easy export.

How It Works

  1. Input – Provide a list of Amazon book page URLs.
  2. Browser Launch – The actor launches a headless Chromium browser with anti-detection settings.
  3. Page Navigation – It navigates to each Amazon book URL using multiple fallback strategies (domcontentloaded → load → commit).
  4. Data Extraction – Extracts Amazon book fields using targeted CSS selectors.
  5. Build Output – Structures all extracted data into a clean record and pushes it to the dataset.
  6. Repeat – Processes all URLs with random delays between requests.

Input

FieldTypeDefaultDescription
urlsString or ArrayRequiredAmazon book page URLs, one per line or as a JSON array.
useApifyProxyBooleantrueWhether to use Apify proxy for Amazon book scraping.
apifyProxyGroupsArray of strings["RESIDENTIAL"]Proxy groups to use (e.g., ["RESIDENTIAL"]).

Example input:

{
"urls": [
"https://www.amazon.com/dp/0735211299",
"https://www.amazon.com/dp/0062316117"
],
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}

Output

Each Amazon book record is pushed as a separate dataset item with the following fields:

FieldTypeDescription
urlstringOriginal Amazon book page URL.
asinstringAmazon Standard Identification Number (ASIN).
titlestringAmazon book title.
authorstringAmazon book author name.
ratingstringAverage Amazon book rating (e.g., "4.7 out of 5 stars").
reviews_countstringTotal number of Amazon book customer reviews.
pricestringAmazon book price (e.g., "$14.99").
imagestringDirect URL to the Amazon book cover image.
descriptionstringAmazon book description/synopsis.
publisherstringAmazon book publisher name.
pub_datestringAmazon book publication date.
pages_countstringNumber of pages in the Amazon book.
languagestringLanguage of the Amazon book.
statusstring"success" or "error".
attemptintegerNumber of attempts made to scrape this Amazon book page.
scraped_atstringISO 8601 UTC timestamp of when the Amazon book was scraped.

Example output (success):

{
"url": "https://www.amazon.com/dp/0735211299",
"asin": "0735211299",
"title": "Atomic Habits",
"author": "James Clear",
"rating": "4.8 out of 5 stars",
"reviews_count": "112,000 ratings",
"price": "$14.99",
"image": "https://m.media-amazon.com/images/I/513Y5o-DYtL.jpg",
"description": "No matter your goals, Atomic Habits offers a proven framework...",
"publisher": "Avery",
"pub_date": "October 16, 2018",
"pages_count": "320",
"language": "English",
"status": "success",
"attempt": 1,
"scraped_at": "2025-03-22T12:34:56Z"
}

Example output (error):

{
"url": "https://www.amazon.com/dp/XXXXXXXXXX",
"status": "error",
"message": "All navigation strategies timed out",
"attempt": 3
}

Use Cases

  • Amazon Book Databases – Build and maintain a structured catalog of Amazon book listings.
  • Price Monitoring – Track Amazon book prices over time for deals and trends.
  • Publishing Research – Analyze Amazon book ratings, reviews, and publisher data.
  • E-commerce Enrichment – Enrich product listings with Amazon book descriptions and cover images.
  • Competitor Analysis – Monitor Amazon book rankings and pricing for competitive intelligence.
  • Academic Research – Collect structured Amazon book data for literature or data science projects.
  • Recommendation Engines – Power Amazon book recommendation systems using ratings and metadata.

Quick Start

  1. Open on Apify – Visit the actor page and click Try for free.
  2. Set Input – Paste your Amazon book page URLs into the urls field.
  3. Enable Proxy – Keep useApifyProxy enabled for reliable Amazon book scraping.
  4. Run the Actor – Start the run and monitor progress in the logs.
  5. Download Results – Export the Amazon book dataset as JSON, CSV, or Excel once finished.

Technical Stack

  • Browser Automation – (Chromium) for JavaScript-rendered Amazon book pages
  • Anti-Detection – Random user agents, disabled webdriver fingerprint
  • HTTP Navigation – Multi-strategy page loading (domcontentloaded, load, commit)
  • Proxy – Apify Proxy (residential) for bypassing Amazon restrictions
  • Platform – Apify Actor — serverless, scalable, integrated with Dataset and Key-Value Store

ActorDescription
Goodreads ScraperExtracts Amazon book ratings, reviews, and metadata from Goodreads.
Book Metadata ScraperExtracts rich Amazon book metadata from the Open Library database.
Google Books ScraperFetches Amazon book metadata and previews via the Google Books API.
ISBN Lookup ToolLooks up detailed Amazon book info by ISBN from multiple data sources.
Book Price ComparatorCompares Amazon book prices across major online retailers.

Changelog

v1.0.0 – Initial Release

  • -based Amazon book page scraping
  • Title, author, price, rating, and reviews extraction
  • ASIN extraction from Amazon book URLs
  • Publisher, publication date, and page count extraction
  • Cover image URL extraction
  • Residential proxy configuration support
  • Anti-detection user agent rotation
  • Retry logic with multiple navigation strategies
  • Random anti-blocking delays
  • Dataset integration with error handling

Pricing

  • Free for basic usage on Apify (up to certain compute limits).
  • Paid plans available for higher volume, priority support, and longer runs.
  • Proxy credits consumed if residential proxies are enabled.

Support & Feedback

  • Issues & Ideas – Open a ticket on the Apify Actor issue tracker.
  • Documentation – Visit Apify Docs for platform guides.
  • Scraping Notes – Always use residential proxies when scraping Amazon book pages to avoid blocks.

Disclaimer: This actor scrapes publicly visible data from Amazon book pages. Please ensure your usage complies with Amazon's terms of service. This actor is intended for research and informational purposes only.