Amazon Product Details Scraper
Pricing
$19.99/month + usage
Amazon Product Details Scraper
Amazon Product Details Scraper extracts rich product data from any Amazon listing. Capture titles, prices, images, ratings, reviews, variations, specs, sellers, and availability. Ideal for market research, pricing analysis, product tracking, and workflows needing structured Amazon details.
Pricing
$19.99/month + usage
Rating
0.0
(0)
Developer

API Empire
Actor stats
0
Bookmarked
3
Total users
2
Monthly active users
3 days ago
Last modified
Categories
Share
A powerful and reliable Python script designed to extract comprehensive product information from Amazon product pages. This script provides detailed product data including prices, reviews, specifications, images, variations, and more, using intelligent embedded JSON extraction with HTML parsing fallback.
Why Choose This Script?
- ๐ Intelligent Data Extraction: Automatically extracts embedded JSON data from Amazon pages (var text = [{...}]) for complete product information
- ๐ Comprehensive Data Extraction: Captures 30+ data points per product including prices, reviews, images, specifications, and more
- ๐ Dual Extraction Method: Primary JSON extraction with HTML parsing fallback ensures maximum data coverage
- โก Fast & Efficient: Direct JSON extraction is faster than pure HTML parsing
- ๐ Bulk Processing: Process multiple Amazon product URLs in a single run
- ๐ Simple Output: Clean JSON output format matching Amazon's data structure
Key Features
๐ Comprehensive Product Data
- Product title, manufacturer, ASIN
- Current price, retail price, price range, savings
- Product rating and total review count
- Availability status and shipping information
- Main product image and additional image URLs
- Product videos, variations, and categories
- Technical specifications and product details
- Delivery messages and important information
๐ฌ Advanced Review Extraction
- Extract reviews directly from product pages
- Reviews include text, rating, date, username, and review IDs
- Support for global reviews from different marketplaces
- Automatic review structure parsing
๐ฏ Smart Data Extraction
- Primary Method: Extracts embedded JSON data from Amazon's JavaScript variables (var text = [{...}])
- Fallback Method: HTML parsing when JSON is not available
- Intelligent Mapping: Automatically maps extracted data to structured format
- Error Handling: Graceful handling of missing or incomplete data
๐ฆ Complete Product Information
- Product variations (size, color, style) with ASINs
- Category breadcrumbs and navigation
- Product specifications and details tables
- About product sections
- Seller information and fulfillment details
Installation
Prerequisites
- Python 3.7 or higher
- pip (Python package manager)
Setup
-
Clone or download this repository
-
Install required dependencies: bash pip install requests beautifulsoup4 lxml
Or install from requirements file: bash pip install -r requirements.txt
Usage
Basic Usage
-
Open temp.py in your editor
-
Add your Amazon product URLs to the amazon_urls list: python amazon_urls = [ 'https://www.amazon.com/dp/B00P8XQPY4', 'https://www.amazon.com/dp/B08XYZ1234',
Add more URLs as needed
]
-
Run the script: bash python temp.py
-
Check the output: Results are saved to amazon_output.json in the same directory
Advanced Usage
You can also import and use the functions programmatically:
python from temp import fetch_and_parse_amazon, save_to_json
Fetch a single product
product_data = fetch_and_parse_amazon('https://www.amazon.com/dp/B00P8XQPY4')
Process multiple products
urls = [ 'https://www.amazon.com/dp/B00P8XQPY4', 'https://www.amazon.com/dp/B08XYZ1234', ] results = [] for url in urls: data = fetch_and_parse_amazon(url) results.append(data)
Save to custom filename
save_to_json(results, 'my_products.json')
Input
Amazon Product URLs
The script accepts Amazon product URLs in various formats:
- Standard format: https://www.amazon.com/dp/B00P8XQPY4
- With parameters: https://www.amazon.com/dp/B00P8XQPY4?psc=1
- Full product URL: https://www.amazon.com/product-name/dp/B00P8XQPY4
- Different domains: Works with amazon.com, amazon.co.uk, amazon.de, etc.
URL Requirements
- Must be a valid Amazon product page URL
- Should contain /dp/ or /gp/product/ followed by a 10-character ASIN
- Product must be publicly accessible (no login required)
Output
The script saves scraped product data to amazon_output.json (or custom filename). Each product record contains:
Core Product Information
- statusCode (Number): HTTP status code of the request (200 = success)
- statusMessage (String): Status message ("FOUND", "ERROR", etc.)
- url (String): Original product URL
- title (String): Product title
- manufacturer (String): Product manufacturer/brand
- asin (String): Amazon Standard Identification Number (10 characters)
- price (Number): Current product price
- retailPrice (Number): Original/retail price (if on sale)
- priceRange (String or null): Price range for variations (if applicable)
- productRating (String): Overall product rating (e.g., "4.7 out of 5 stars")
- countReview (Number): Total number of reviews
- warehouseAvailability (String): Stock status ("In Stock", "Only 2 left in stock", etc.)
- soldBy (String): Seller name
- sellerId (String): Seller ID
- fulfilledBy (String): Fulfillment method (usually seller name or "Amazon")
Media & Visual Content
- mainImage (Object): Main product image
- imageUrl (String): Image URL
- imageResolution (String): Image dimensions
- imageUrlList (Array): List of additional product image URLs
- videoeUrlList (Array): List of product video URLs
Product Details
- productDescription (String): Full product description text
- features (Array): List of product features/bullet points
- productDetails (Array): Detailed product information with name-value pairs
- Each item contains: name (String), value (String)
- productSpecification (Array): Technical specifications
- variations (Array): Product variations (size, color, etc.)
- Each variation contains:
- variationName (String): e.g., "size_name", "color_name"
- values (Array): Available variation options with ASINs
- Each variation contains:
- bookVariations (Array): Book format variations (if applicable)
- categoriesExtended (Array): Product categories with breadcrumb navigation
- Each category contains: name, url, node
- minimalQuantity (String): Minimum purchase quantity (usually "1")
Shipping & Delivery
- shippingPrice (Number): Shipping cost (0 for free shipping)
- priceShippingInformation (String): Shipping details and delivery options
- deliveryMessage (String or null): Delivery time estimate
- importantInformation (Array): Important notices and alerts
Reviews
- reviews (Array): List of customer reviews from the product page
- Each review contains:
- text (String): Review content
- date (String): Review date
- rating (String): Star rating (e.g., "5.0 out of 5 stars")
- title (String): Review title
- userName (String): Reviewer name
- url (String): Review URL
- reviewId (String): Unique review ID
- profilePath (String): Reviewer profile path
- imageUrlList (Array): Review images (if any)
- variationList (Array): Product variations mentioned in review
- locale (Object or null): Review locale information
- Each review contains:
- globalReviews (Array): Reviews from different Amazon marketplaces
Additional Information
- aboutProduct (Array): Comprehensive product information sections
- Each item contains: name, value
- buyBoxUsed (Boolean or null): Whether the product has a used/refurbished option
- deal (Boolean): Whether product has special deals/badges
- prime (Boolean): Whether product is Prime eligible
- used (Boolean): Whether used options are available
- pastSales (String): Sales rank and purchase information (e.g., "500+ bought in past month")
- reviewInsights (Object): Review analysis data (if available)
Output Format
The output is a JSON array where each product is at index [0], [1], etc.:
json [ { "statusCode": 200, "statusMessage": "FOUND", "url": "https://www.amazon.com/dp/B00P8XQPY4", "title": "SanDisk 128GB Ultra USB 3.0 Flash Drive - SDCZ48-128G-U46, Red", "manufacturer": "Visit the SanDisk Store", "asin": "B00P8XQPY4", "price": 14.35, "productRating": "4.7 out of 5 stars", "countReview": 129199, "reviews": [...], "variations": [...], "imageUrlList": [...] } ]
How It Works
Data Extraction Method
-
Primary Method - Embedded JSON Extraction:
- Searches for Amazon's embedded JSON data in JavaScript variables
- Most common pattern: var text = [{...}] where product data is at text[0]
- Extracts complete product data structure directly from JSON
- Fast and reliable when available
-
Fallback Method - HTML Parsing:
- When embedded JSON is not found, parses HTML elements
- Uses BeautifulSoup to extract data from HTML structure
- Provides comprehensive coverage for all fields
-
Hybrid Approach:
- Uses embedded JSON when available
- Falls back to HTML parsing for missing fields
- Ensures maximum data extraction
Technical Details
- Headers: Uses realistic browser headers to avoid blocking
- Error Handling: Graceful handling of network errors and missing data
- ASIN Extraction: Automatically extracts ASIN from URL or page
- Data Validation: Validates and cleans extracted data
- Encoding: Handles UTF-8 encoding for international characters
Best Use Cases
- ๐ Market Research: Track competitor products, prices, and customer feedback
- ๐ฐ Price Monitoring: Monitor product prices and availability across Amazon listings
- ๐ Product Analysis: Analyze customer sentiment and product features from reviews
- ๐ Inventory Management: Check product availability and variations
- ๐ Content Creation: Gather product information for blog posts, reviews, or comparisons
- ๐ค Data Aggregation: Collect product data for analytics platforms or databases
- ๐ฑ E-commerce Integration: Import Amazon product data into your own platform
- ๐ Price Tracking: Track price changes over time
- ๐ Product Comparison: Compare multiple products side-by-side
Frequently Asked Questions
How does the embedded JSON extraction work?
The script searches for Amazon's embedded JSON data in JavaScript variables like var text = [{...}]. Amazon stores complete product data in these variables, which the script extracts and uses as the primary data source. This is faster and more reliable than pure HTML parsing.
Can I scrape multiple products at once?
Yes! Simply add multiple Amazon product URLs to the amazon_urls list. The script will process them sequentially and save all results to the JSON file.
What happens if a product page is not accessible?
The script will return an error record with statusCode indicating the error (e.g., 404, 503) and an error field with details. The script will continue processing other URLs.
Can I scrape Amazon products from different countries?
Yes, the script supports Amazon URLs from any Amazon domain (amazon.com, amazon.co.uk, amazon.de, amazon.fr, etc.). Just include the full URL with the appropriate domain.
How accurate is the data extraction?
The script uses Amazon's own embedded JSON data when available, which provides highly accurate data. When falling back to HTML parsing, it uses multiple selectors and validation to ensure data accuracy.
What's the difference between embedded JSON and HTML parsing?
- Embedded JSON: Faster, more complete, directly from Amazon's data structure
- HTML Parsing: More flexible, works when JSON is not available, but may miss some fields
How long does it take to scrape a product?
Typically 2-5 seconds per product, depending on:
- Network speed
- Amazon's response time
- Whether embedded JSON is available (faster) or HTML parsing is needed (slightly slower)
Does the script work with Amazon Prime-only products?
Yes, the script can scrape any publicly accessible Amazon product page, including Prime-only products (as long as you can view the page in your browser).
Can I customize the output format?
Yes, you can modify the save_to_json function or create your own output function. The data structure is returned as a dictionary, so you can format it however you need.
What if I get blocked by Amazon?
Amazon may temporarily block requests if you make too many requests too quickly. The script includes realistic headers, but for heavy usage, consider:
- Adding delays between requests
- Using proxies (you can modify the script to add proxy support)
- Respecting Amazon's rate limits
Troubleshooting
Common Issues
-
"No module named 'requests'" or similar errors
- Solution: Install dependencies with pip install requests beautifulsoup4 lxml
-
Empty or incomplete data
- Amazon may have changed their HTML structure
- Try a different product URL to verify
- Check if the product page loads correctly in your browser
-
Network errors or timeouts
- Check your internet connection
- Amazon may be temporarily unavailable
- Try again after a few minutes
-
JSON parsing errors
- The embedded JSON structure may have changed
- The script will automatically fall back to HTML parsing
Getting Help
If you encounter issues:
- Check that your URLs are valid Amazon product pages
- Verify that the product page loads in your browser
- Ensure all dependencies are installed
- Check the error messages in the console output
Cautions
โ Important Legal and Ethical Considerations:
- This script scrapes only publicly available data from Amazon product pages
- Respect Amazon's Terms of Service when using this script
- Do not scrape private or password-protected content
- Comply with local laws regarding web scraping and data collection
- Use responsibly and avoid aggressive scraping that could impact Amazon's servers
- Respect rate limits - add delays between requests for bulk operations
- Data usage: You are responsible for how you use the scraped data and must ensure compliance with privacy laws, spam regulations, and data protection requirements
Note: This script is designed for legitimate business and research purposes. Always use it ethically and in compliance with applicable laws and Amazon's terms of service.
License
This script is provided as-is for educational and research purposes. Use responsibly and in accordance with Amazon's Terms of Service and applicable laws.