Amazon Reviews Scraper avatar
Amazon Reviews Scraper

Pricing

from $40.00 / 1,000 results

Go to Apify Store
Amazon Reviews Scraper

Amazon Reviews Scraper

Extract customer reviews from any Amazon product with filtering by star rating, verified purchases, and sorting options. Returns structured data including review text, ratings, helpful counts, dates, sentiment hints, images, and more across 19+ Amazon domains.

Pricing

from $40.00 / 1,000 results

Rating

5.0

(3)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

9 hours ago

Last modified

Share

A production-grade Amazon Reviews Scraper built for the Apify platform. This scraper extracts high-quality, structured review data from Amazon product pages with advanced filtering, pagination, and GDPR compliance features.

๐Ÿš€ Features

  • Multi-product support: Scrape reviews from multiple Amazon products in a single run
  • Smart filtering: Filter by star rating, verified purchases only
  • Flexible sorting: Sort by most helpful or most recent reviews
  • GDPR compliant: Option to exclude sensitive reviewer data
  • Multi-region: Support for 19+ Amazon domains worldwide
  • Anti-detection: Browser fingerprinting, proxy rotation, and stealth mode
  • Deduplication: Automatic removal of duplicate reviews
  • Pagination: Handles all review pages automatically
  • Image extraction: Optionally extract reviewer-uploaded images
  • Robust error handling: Captcha detection, retry logic, rate limiting

๐Ÿ“ฆ Output Schema

Each review is returned with the following structure:

{
"productAsin": "B0BDHWDR12",
"ratingScore": 5,
"reviewTitle": "Amazing product!",
"reviewUrl": "https://www.amazon.com/gp/customer-reviews/R1234567890",
"reviewReaction": "123 people found this helpful",
"reviewedIn": "Reviewed in the United States on January 15, 2024",
"reviewDescription": "This is the full review text...",
"isVerified": true,
"variant": "Size: Large, Color: Blue",
"reviewImages": ["https://..."],
"position": 1,
"reviewId": "R1234567890",
"helpfulCount": 123,
"reviewDate": "2024-01-15T00:00:00",
"reviewLocation": "United States",
"sentimentHint": "positive",
"wordCount": 150,
"hasImages": true,
"imageCount": 2,
"scrapedAt": "2024-01-20T10:30:00Z",
"sourceUrl": "https://www.amazon.com/dp/B0BDHWDR12"
}

๐Ÿ”ง Input Configuration

FieldTypeDescriptionDefault
productUrlsarrayList of Amazon product URLs (required)-
maxReviewsintegerMaximum reviews per product (1-500)All available
sortBystringSort order: helpful or recenthelpful
filterByStarintegerFilter by star rating (1-5)All stars
verifiedOnlybooleanOnly verified purchasesfalse
includeImagesbooleanExtract reviewer imagestrue
includeGdprSensitivebooleanInclude reviewer names/avatarsfalse
proxyModestringautomatic or residentialautomatic
countrystringAmazon domainamazon.com
maxConcurrencyintegerConcurrent pages (1-10)3
requestTimeoutintegerRequest timeout in seconds60
retryCountintegerRetries for failed requests3

Example Input

{
"productUrls": [
"https://www.amazon.com/dp/B0BDHWDR12",
"https://www.amazon.com/dp/B08N5WRWNW"
],
"maxReviews": 100,
"sortBy": "helpful",
"filterByStar": null,
"verifiedOnly": false,
"includeImages": true,
"includeGdprSensitive": false,
"proxyMode": "automatic",
"country": "amazon.com"
}

๐ŸŒ Supported Amazon Domains

DomainCountry
amazon.comUnited States
amazon.co.ukUnited Kingdom
amazon.deGermany
amazon.frFrance
amazon.itItaly
amazon.esSpain
amazon.caCanada
amazon.com.auAustralia
amazon.co.jpJapan
amazon.inIndia
amazon.com.brBrazil
amazon.com.mxMexico
amazon.nlNetherlands
amazon.sgSingapore
amazon.aeUAE
amazon.saSaudi Arabia
amazon.plPoland
amazon.seSweden
amazon.com.trTurkey

๐Ÿ—๏ธ Architecture

reviews-scraper/
โ”œโ”€โ”€ src/
โ”‚ โ”œโ”€โ”€ main.py # Apify actor entry point
โ”‚ โ”œโ”€โ”€ scraper.py # Core scraping logic
โ”‚ โ”œโ”€โ”€ url_parser.py # URL parsing & ASIN extraction
โ”‚ โ”œโ”€โ”€ review_extractor.py # HTML parsing & extraction
โ”‚ โ”œโ”€โ”€ data_normalizer.py # Data validation & normalization
โ”‚ โ”œโ”€โ”€ deduplication.py # Duplicate removal
โ”‚ โ”œโ”€โ”€ proxy_manager.py # Proxy & session management
โ”‚ โ”œโ”€โ”€ cookie_manager.py # MongoDB cookie rotation
โ”‚ โ”œโ”€โ”€ cookie_admin.py # Cookie management CLI
โ”‚ โ”œโ”€โ”€ email_notifier.py # Failure email notifications
โ”‚ โ”œโ”€โ”€ constants.py # Configuration constants
โ”‚ โ””โ”€โ”€ utils.py # Helper functions
โ”œโ”€โ”€ .actor/
โ”‚ โ””โ”€โ”€ actor.json # Apify actor configuration
โ”œโ”€โ”€ Dockerfile # Docker build configuration
โ”œโ”€โ”€ INPUT_SCHEMA.json # Input schema for Apify
โ”œโ”€โ”€ requirements.txt # Python dependencies
โ””โ”€โ”€ README.md

๐Ÿ”’ GDPR Compliance

When includeGdprSensitive is set to false (default), the scraper excludes:

  • Reviewer names
  • Reviewer profile links
  • Reviewer avatars

This ensures compliance with privacy regulations while still providing valuable review data.

โš™๏ธ Technical Details

Anti-Detection Measures

  1. Browser Fingerprinting: Randomized user agents, viewports, and locales
  2. Stealth Mode: Webdriver detection bypass, plugin spoofing
  3. Proxy Rotation: Support for datacenter and residential proxies
  4. Rate Limiting: Dynamic request throttling
  5. Session Management: Automatic session rotation on blocks
  6. Human Simulation: Mouse movements, scrolling, and realistic delays

โš ๏ธ Important: Proxies Required

Amazon has aggressive anti-bot detection. For reliable scraping, you MUST use proxies.

  • Without proxies: Expect sign-in redirects and captchas
  • With datacenter proxies: Works for moderate volume
  • With residential proxies: Best success rate

On Apify, enable "Apify Proxy" with residential group for best results:

{
"proxyMode": "residential"
}

Error Handling

  • Captcha Detection: Automatically detects and reports captcha challenges
  • Retry Logic: Exponential backoff for failed requests
  • Timeout Management: Configurable request timeouts
  • Graceful Degradation: Continues with remaining products on failures

Limitations

  • Maximum ~100 reviews per star rating (Amazon limitation)
  • Maximum ~500 total reviews per product
  • Only reviews with text content are extracted

Test Input File

Create storage/key_value_stores/default/INPUT.json:

{
"productUrls": ["https://www.amazon.com/dp/B0BDHWDR12"],
"maxReviews": 20,
"sortBy": "helpful"
}

๐Ÿ“Š Example Output

[
{
"productAsin": "B0BDHWDR12",
"ratingScore": 5,
"reviewTitle": "Best purchase ever!",
"reviewUrl": "https://www.amazon.com/gp/customer-reviews/R3EXAMPLE123",
"reviewReaction": "234 people found this helpful",
"reviewedIn": "Reviewed in the United States on December 15, 2023",
"reviewDescription": "I've been using this product for 3 months now and it has exceeded all my expectations. The quality is outstanding and the value for money is excellent. Highly recommend to anyone looking for a reliable solution.",
"isVerified": true,
"variant": "Size: Medium",
"reviewImages": [
"https://images-na.ssl-images-amazon.com/images/I/71example1.jpg"
],
"position": 1,
"reviewId": "R3EXAMPLE123",
"helpfulCount": 234,
"reviewDate": "2023-12-15T00:00:00",
"reviewLocation": "United States",
"sentimentHint": "positive",
"wordCount": 42,
"hasImages": true,
"imageCount": 1,
"scrapedAt": "2024-01-20T10:30:00Z",
"sourceUrl": "https://www.amazon.com/dp/B0BDHWDR12"
}
]

๐Ÿ”— Integration

Export Formats

  • JSON: Native output format
  • CSV: Export via Apify dataset
  • Excel: Export via Apify dataset

API Access

# Get results via API
curl "https://api.apify.com/v2/datasets/{datasetId}/items?token={apiToken}"