Amazon Keyword and Product Scraper Pro avatar
Amazon Keyword and Product Scraper Pro

Pricing

Pay per usage

Go to Apify Store
Amazon Keyword and Product Scraper Pro

Amazon Keyword and Product Scraper Pro

Developed by

Neuro Scraper

Neuro Scraper

Maintained by Community

πŸ›οΈ Amazon Search Scraper β€” Collect real-time product data from Amazon by just entering keywords πŸ”Ž or product URLs πŸ”—! Get title, price, ratings ⭐, stock info & images πŸ–ΌοΈ in clean structured format. Perfect for price tracking πŸ’°, market research πŸ“Š & competitor analysis πŸš€.

0.0 (0)

Pricing

Pay per usage

0

1

1

Last modified

3 days ago

🎯 Amazon Search Keywords and products Scraper

Effortlessly extract structured product data from Amazon search results or directly from product URLs β€” including pricing, ratings, availability, and product metadata.


πŸ“– Summary

This Apify Actor can extract data from Amazon in two ways:

  1. By providing search keywords (it collects all products listed in search results).
  2. By providing product URLs (it fetches details directly from each page).

Structured data is stored in the default Dataset.


πŸ’‘ Use cases

  • πŸ›οΈ E-commerce price monitoring and comparison
  • πŸ“Š Market trend and keyword research
  • πŸ” Product catalog enrichment for Amazon listings
  • 🧠 Competitor intelligence automation

⚑ Quick Start (Apify Console)

  1. Open this Actor in the Apify Console.
  2. Click Run β†’ Input tab.
  3. Paste JSON input such as:
{
"queries": ["wireless earbuds", "gaming mouse"],
"urls": ["https://www.amazon.com/dp/B0D1234XYZ"],
"concurrency": 8
}
  1. Optionally configure a proxy (see 🌍 Proxy Configuration below).
  2. Click Run β€” data will appear in the default Dataset.

⚑ Quick Start (CLI + API)

CLI

$apify run <ACTOR_ID> -p input.json

Where input.json contains:

{
"queries": ["laptop stand"],
"urls": ["https://www.amazon.com/dp/B0D1234XYZ"],
"concurrency": 5
}

API (Python)

from apify_client import ApifyClient
client = ApifyClient('<APIFY_TOKEN>')
run = client.actor('username~amazon-search-scraper').call(run_input={
'queries': ['mechanical keyboard'],
'urls': ['https://www.amazon.com/dp/B0D5678ABC'],
'concurrency': 8
})
print(run['defaultDatasetId'])

πŸ“ Inputs

πŸ”‘ NameπŸ“ Type❓ Requiredβš™οΈ DefaultπŸ“Œ ExampleπŸ“ Notes
queriesarray / string❌ Nonull["wireless earbuds"]Amazon search keywords to collect listings
urlsarray / string❌ Nonull["https://www.amazon.com/dp/B0D1234XYZ"]Direct product page URLs
concurrencyinteger❌ No85Max concurrent product fetch tasks
proxyConfigobjectβš™οΈ Optional{ "useApifyProxy": true }{ "useApifyProxy": true }Configure proxy (see below)

πŸ’‘ Example: Paste into Console input editor:

{"urls": ["https://www.amazon.com/dp/B0D5678ABC"], "concurrency": 4}

βš™οΈ Configuration

πŸ”‘ NameπŸ“ Type❓ Requiredβš™οΈ DefaultπŸ“Œ ExampleπŸ“ Notes
OUTPUT_FILEstring❌ Noamazon.search.result.jsonoutput.jsonInternal output file for backup
REQUEST_TIMEOUTinteger❌ No3045Timeout in seconds per request
APIFY_TOKENstringβœ… Yesβ€”<APIFY_TOKEN>Required for Apify client/API use

πŸ“€ Outputs

Results are stored in the default Dataset.

Example Output Item

{
"asin": "B0D1234XYZ",
"title": "Wireless Earbuds with Noise Cancellation",
"url": "https://www.amazon.com/dp/B0D1234XYZ",
"price": "$49.99",
"currency": "$",
"brand_name": "SoundMagic",
"availability": "In Stock",
"stars": 4.5,
"number_of_reviews_text": "1,234 ratings",
"categories": "Electronics > Audio > Headphones",
"images": ["https://m.media-amazon.com/images/I/xyz.jpg"]
}

πŸ”‘ Environment variables

NameDescription
APIFY_TOKENYour Apify API token for running via CLI or client
HTTP_PROXY(Optional) Custom HTTP proxy endpoint
HTTPS_PROXY(Optional) Custom HTTPS proxy endpoint

▢️ How to Run

Apify Console

  1. Go to Actor β†’ Run.
  2. Paste JSON input containing either queries or urls.
  3. Enable proxy under Proxy tab (recommended).
  4. Click Start and monitor logs.

Apify CLI

$apify call username~amazon-search-scraper -p input.json

Apify Client (Python)

See Quick Start (API) example above.


⏰ Scheduling & Webhooks

  • Use the Schedule tab in the Apify Console to run daily/weekly.
  • Add a Webhook under the Webhooks tab to trigger external automation (e.g., send results to Slack or Google Sheets).

🐞 Logs & Troubleshooting

IssueCauseFix
Empty resultsAmazon blocked requestEnable Apify Proxy or rotate proxies
Timeout errorsNetwork latency or blockingIncrease REQUEST_TIMEOUT or reduce concurrency
Missing product detailsPage layout changedReport issue or rerun after 24h

πŸ”’ Permissions & Storage

  • Uses the default Dataset for structured data.
  • Temporary files saved in Actor local storage.
  • Secure credentials (tokens, proxies) should be stored as Secrets in the Apify Console.

πŸ†• Changelog / Versioning

  • v1.1.0 β€” Added support for scraping from direct product URLs.
  • v1.0.0 β€” Initial public release.

πŸ“Œ Notes / TODOs

  • TODO: Confirm supported Amazon domains (currently assumes amazon.com).
  • TODO: Add optional input for country_code or domain selection.

🌍 Proxy Configuration

Because this Actor sends requests to Amazon, proxy use is highly recommended.

  1. Open the Run page β†’ Proxy tab.
  2. Check Use Apify Proxy.
  3. Select a proxy group (e.g., RESIDENTIAL or SHADER).

Custom Proxy Configuration

If you prefer your own proxy, go to Actor β†’ Settings β†’ Environment variables and set:

HTTP_PROXY=http://<PROXY_USER>:<PROXY_PASS>@<HOST>:<PORT>
HTTPS_PROXY=http://<PROXY_USER>:<PROXY_PASS>@<HOST>:<PORT>

πŸ”’ Always store proxy credentials securely as Secrets.

TODO

Implement proxy rotation per request for improved anti-blocking resilience.


πŸ“š References


🧐 What I inferred from main.py

  • Actor collects Amazon product listings via search keywords and direct product URLs.
  • Network activity detected β€” proxy section included.
  • Outputs JSON list of structured product data.
  • Domain is assumed to be amazon.com β€” marked as TODO for domain parameterization.