Amazon Products Scraper (Brasil) avatar

Amazon Products Scraper (Brasil)

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Amazon Products Scraper (Brasil)

Amazon Products Scraper (Brasil)

Um aplicativo que coleta automaticamente (faz scraping) e estrutura dados de produtos disponíveis publicamente a partir de anúncios da Amazon para análise e monitoramento.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Lucas Missalia

Lucas Missalia

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

8 hours ago

Last modified

Categories

Share

🛒 Amazon Product Scraper (Optimized) — Apify Actor

An Apify Actor that extracts structured product data from Amazon search result pages using Playwright (async Chromium), with a focus on high performance and stability on the Apify platform. citeturn4search1turn3file4

Performance-first version: this iteration adds concurrency, resource blocking, and single-pass DOM extraction to reduce Playwright overhead and prevent slowdowns/freezes under load. citeturn3file4


📦 Features

  • Scrapes Amazon search result pages for product listings. citeturn4search1
  • Extracts rich product data including prices, ratings, delivery info, stock hints, and optional specs. citeturn4search1turn3file4
  • Automatically handles cookie consent banners (best-effort). citeturn4search1turn3file4
  • Triggers lazy-loaded content with a lightweight scroll. citeturn4search1turn3file4
  • Optimized for Apify performance:
    • Concurrent workers (multiple pages in parallel). citeturn3file4
    • Blocks heavy resources (images/fonts/styles/media) to reduce CPU/RAM and speed navigation. citeturn3file4
    • Single page.evaluate() extraction to avoid hundreds of Playwright round-trips per page. citeturn3file4
  • Detects Kindle Unlimited info (when present) and flags international purchase availability (best-effort). citeturn4search1turn3file4
  • Outputs clean, structured JSON data via Apify's dataset. citeturn4search1turn3file4

🗂️ Output

Output model

This optimized version pushes one dataset item per product (in batches) and includes the originating search URL in sourceUrl. citeturn3file4

Example item

{
"sourceUrl": "https://www.amazon.com/s?k=laptop",
"name": "Product Title",
"asin": "B09XYZ1234",
"rate": "4.5 out of 5 stars",
"rateCount": "12,345 ratings",
"description": "Short editorial or badge description",
"price": "$499.99",
"delivery": "Monday, Apr 22",
"shippingInformation": "Free delivery Monday, Apr 22",
"fastestDelivery": "Fastest delivery: Tomorrow",
"internationalPurchase": false,
"thumbnail": "https://m.media-amazon.com/images/...",
"link": "https://www.amazon.com/dp/B09XYZ1234",
"stockDetails": "Only 20 left in stock - order soon.",
"details": {
"screenSize": "15.6 Inches",
"ram": "16 GB",
"kindleUnlimited": "https://www.amazon.com/...",
"message": "Read for free with Kindle Unlimited"
}
}

Notes

  • stockDetails and details are optional and appear only when the product listing includes those blocks. citeturn4search1turn3file4
  • Only products that have both a name and a price are included in the output. citeturn4search1turn3file4

⚙️ Input Configuration

Configure the Actor via the Apify input UI or by passing JSON.

Fields

  • start_urls (array) — List of Amazon search URLs to scrape.
    • Default: [{"url": "https://www.amazon.com/s?k=spider+man+comics"}] citeturn3file4
  • max_concurrency (number) — Number of parallel workers/pages.
    • Default: 4 citeturn3file4
  • max_retries (number) — Retries per URL on timeout/error.
    • Default: 2 citeturn3file4
  • navigation_timeout_ms (number) — Navigation timeout in milliseconds.
    • Default: 30000 citeturn3file4

Example Input

{
"start_urls": [
{ "url": "https://www.amazon.com/s?k=mechanical+keyboard" },
{ "url": "https://www.amazon.com/s?k=gaming+headset" }
],
"max_concurrency": 4,
"max_retries": 2,
"navigation_timeout_ms": 30000
}

🛠️ Tech Stack

  • Apify SDK (Python) — Actor lifecycle, request queue, dataset output. citeturn4search1
  • Playwright (async) — Headless Chromium browser automation. citeturn4search1

🔍 How It Works

  1. Initialization — The Actor reads start_urls from the input and seeds a request queue. citeturn4search1turn3file4
  2. Browser + Context — A headless Chromium browser is launched with a realistic user-agent and viewport. citeturn4search1turn3file4
  3. Performance routing — Requests are intercepted and heavy resources are blocked (fonts, stylesheets, media, images). citeturn3file4
  4. Concurrent processing — Multiple workers reuse pages and pull URLs from the queue concurrently. citeturn3file4
  5. Page processing — For each URL, the Actor:
    • Navigates to the page and accepts cookie banners if present (best-effort). citeturn4search1turn3file4
    • Waits for the Amazon search result grid to appear.
    • Performs a light scroll to trigger lazy-loaded product cards. citeturn4search1turn3file4
  6. Data extraction — Extracts all product cards in a single page.evaluate() call and post-processes fields (e.g., Kindle Unlimited link). citeturn3file4
  7. Output — Structured product items are pushed to the Apify dataset (batched). citeturn3file4

🚀 Performance Tips

  • Start with max_concurrency = 2–4 and increase gradually based on your Actor’s CPU/RAM. citeturn3file4
  • If you observe timeouts, increase navigation_timeout_ms and/or reduce concurrency. citeturn3file4
  • Blocking heavy resources improves speed, but if you need higher fidelity (e.g., guaranteed thumbnails), consider allowing images by removing image from the blocked types. citeturn3file4

⚠️ Limitations & Notes

  • Amazon actively detects and blocks scrapers. This Actor uses a realistic user-agent and scroll behavior to mitigate this, but results may vary depending on the region and Amazon's current anti-bot policies. citeturn4search1turn3file4
  • Amazon page layout and selectors change frequently; you may need to update selectors inside the extraction script. citeturn3file4
  • Intended for search result pages (e.g., /s?k=...), not product detail pages. citeturn3file4

📄 License / Compliance

This project is intended for personal and educational use. Always comply with Amazon’s Terms of Service and applicable laws when scraping. citeturn4search1