Amazon Products Scraper (Brasil)
Pricing
from $2.00 / 1,000 results
Amazon Products Scraper (Brasil)
Um aplicativo que coleta automaticamente (faz scraping) e estrutura dados de produtos disponíveis publicamente a partir de anúncios da Amazon para análise e monitoramento.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer
Lucas Missalia
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
8 hours ago
Last modified
Categories
Share
🛒 Amazon Product Scraper (Optimized) — Apify Actor
An Apify Actor that extracts structured product data from Amazon search result pages using Playwright (async Chromium), with a focus on high performance and stability on the Apify platform. citeturn4search1turn3file4
Performance-first version: this iteration adds concurrency, resource blocking, and single-pass DOM extraction to reduce Playwright overhead and prevent slowdowns/freezes under load. citeturn3file4
📦 Features
- Scrapes Amazon search result pages for product listings. citeturn4search1
- Extracts rich product data including prices, ratings, delivery info, stock hints, and optional specs. citeturn4search1turn3file4
- Automatically handles cookie consent banners (best-effort). citeturn4search1turn3file4
- Triggers lazy-loaded content with a lightweight scroll. citeturn4search1turn3file4
- Optimized for Apify performance:
- Concurrent workers (multiple pages in parallel). citeturn3file4
- Blocks heavy resources (images/fonts/styles/media) to reduce CPU/RAM and speed navigation. citeturn3file4
- Single
page.evaluate()extraction to avoid hundreds of Playwright round-trips per page. citeturn3file4
- Detects Kindle Unlimited info (when present) and flags international purchase availability (best-effort). citeturn4search1turn3file4
- Outputs clean, structured JSON data via Apify's dataset. citeturn4search1turn3file4
🗂️ Output
Output model
This optimized version pushes one dataset item per product (in batches) and includes the originating search URL in sourceUrl. citeturn3file4
Example item
{"sourceUrl": "https://www.amazon.com/s?k=laptop","name": "Product Title","asin": "B09XYZ1234","rate": "4.5 out of 5 stars","rateCount": "12,345 ratings","description": "Short editorial or badge description","price": "$499.99","delivery": "Monday, Apr 22","shippingInformation": "Free delivery Monday, Apr 22","fastestDelivery": "Fastest delivery: Tomorrow","internationalPurchase": false,"thumbnail": "https://m.media-amazon.com/images/...","link": "https://www.amazon.com/dp/B09XYZ1234","stockDetails": "Only 20 left in stock - order soon.","details": {"screenSize": "15.6 Inches","ram": "16 GB","kindleUnlimited": "https://www.amazon.com/...","message": "Read for free with Kindle Unlimited"}}
Notes
stockDetailsanddetailsare optional and appear only when the product listing includes those blocks. citeturn4search1turn3file4- Only products that have both a name and a price are included in the output. citeturn4search1turn3file4
⚙️ Input Configuration
Configure the Actor via the Apify input UI or by passing JSON.
Fields
start_urls(array) — List of Amazon search URLs to scrape.- Default:
[{"url": "https://www.amazon.com/s?k=spider+man+comics"}]citeturn3file4
- Default:
max_concurrency(number) — Number of parallel workers/pages.- Default:
4citeturn3file4
- Default:
max_retries(number) — Retries per URL on timeout/error.- Default:
2citeturn3file4
- Default:
navigation_timeout_ms(number) — Navigation timeout in milliseconds.- Default:
30000citeturn3file4
- Default:
Example Input
{"start_urls": [{ "url": "https://www.amazon.com/s?k=mechanical+keyboard" },{ "url": "https://www.amazon.com/s?k=gaming+headset" }],"max_concurrency": 4,"max_retries": 2,"navigation_timeout_ms": 30000}
🛠️ Tech Stack
- Apify SDK (Python) — Actor lifecycle, request queue, dataset output. citeturn4search1
- Playwright (async) — Headless Chromium browser automation. citeturn4search1
🔍 How It Works
- Initialization — The Actor reads
start_urlsfrom the input and seeds a request queue. citeturn4search1turn3file4 - Browser + Context — A headless Chromium browser is launched with a realistic user-agent and viewport. citeturn4search1turn3file4
- Performance routing — Requests are intercepted and heavy resources are blocked (fonts, stylesheets, media, images). citeturn3file4
- Concurrent processing — Multiple workers reuse pages and pull URLs from the queue concurrently. citeturn3file4
- Page processing — For each URL, the Actor:
- Navigates to the page and accepts cookie banners if present (best-effort). citeturn4search1turn3file4
- Waits for the Amazon search result grid to appear.
- Performs a light scroll to trigger lazy-loaded product cards. citeturn4search1turn3file4
- Data extraction — Extracts all product cards in a single
page.evaluate()call and post-processes fields (e.g., Kindle Unlimited link). citeturn3file4 - Output — Structured product items are pushed to the Apify dataset (batched). citeturn3file4
🚀 Performance Tips
- Start with
max_concurrency = 2–4and increase gradually based on your Actor’s CPU/RAM. citeturn3file4 - If you observe timeouts, increase
navigation_timeout_msand/or reduce concurrency. citeturn3file4 - Blocking heavy resources improves speed, but if you need higher fidelity (e.g., guaranteed thumbnails), consider allowing images by removing
imagefrom the blocked types. citeturn3file4
⚠️ Limitations & Notes
- Amazon actively detects and blocks scrapers. This Actor uses a realistic user-agent and scroll behavior to mitigate this, but results may vary depending on the region and Amazon's current anti-bot policies. citeturn4search1turn3file4
- Amazon page layout and selectors change frequently; you may need to update selectors inside the extraction script. citeturn3file4
- Intended for search result pages (e.g.,
/s?k=...), not product detail pages. citeturn3file4
📄 License / Compliance
This project is intended for personal and educational use. Always comply with Amazon’s Terms of Service and applicable laws when scraping. citeturn4search1