Amazon Product Scraper avatar
Amazon Product Scraper

Pricing

$0.50 / 1,000 results

Go to Apify Store
Amazon Product Scraper

Amazon Product Scraper

Developed by

Neuro Scraper

Neuro Scraper

Maintained by Community

πŸ›’ **Amazon Product Scraper** β€” ⚑ Instantly fetch product data from Amazon by keyword! Get πŸ“¦ titles, πŸ’² prices, ⭐ reviews, πŸ–ΌοΈ images & more β€” all in seconds. Perfect for πŸ“Š market research, 🧠 trend tracking, and πŸ’Ό pricing analysis. **Fast, secure & ready to run!**

0.0 (0)

Pricing

$0.50 / 1,000 results

0

1

1

Last modified

9 hours ago

One-line hero: Run a production-ready actor that searches Amazon by keywords and delivers clean product records β€” instant insights, secure by design.


πŸ“– Short summary

A plug-and-play Actor that performs keyword searches on Amazon, enriches the results with product page details, and exports a clean dataset ready for analysis. Designed for fast, repeatable runs in the Apify Console β€” no developer setup required.


πŸ’‘ Use cases / When to use

  • Market research: compare prices, reviews, and availability across search results.
  • Competitor monitoring: track product listings and price changes for a set of queries.
  • Lead collection for product feeds: gather thumbnails, product URLs, and metadata.
  • Quick audits: get a snapshot of top search results for given keywords.

⚑ Quick Start (Console β€” one-click)

  1. Open the Actor in Apify Console.
  2. Paste a JSON input containing queries (single string or array). See input.example.json for a minimal example.
  3. Click Run β€” the Actor runs and saves results to the default dataset and an output file.

βš™οΈ Quick Start (CLI + API)

CLI

$apify run --actor <OWNER>/<ACTOR_NAME> --input input.example.json

API (Python β€” apify-client)

from apify_client import ApifyClient
client = ApifyClient('<APIFY_TOKEN>')
run = client.actor_runs().start('<OWNER>/<ACTOR_NAME>', input={
'queries': ['wireless earbuds', 'bluetooth speaker'],
'concurrency': 8,
})
print('Started run:', run['id'])

Replace <APIFY_TOKEN> and <OWNER>/<ACTOR_NAME> with your values.


πŸ“ Inputs (fields & schema)

{
"queries": ["string"] ,
"concurrency": 8
}
  • queries β€” required. A single search string or an array of search keywords (e.g. "wireless earbuds" or ["wireless earbuds","bluetooth speaker"]).
  • concurrency β€” optional. Number of concurrent product page fetches. Defaults to 8.

βš™οΈ Configuration

πŸ”‘ NameπŸ“ Type❓ Requiredβš™οΈ DefaultπŸ“Œ Example🧠 Notes
queriesarray / stringβœ… YesNone["wireless earbuds"]One or more search keywords
concurrencyintegerβš™οΈ Optional84Controls concurrent product fetches
outputFilestringβš™οΈ Optionalamazon.search.result.json"amazon.search.result.json"Local file written in run (pushed to dataset)

Example Console setup: Paste ["wireless earbuds"] into the Input field and click Run Actor.


πŸ“„ Outputs (Dataset / KV examples)

The Actor pushes a JSON array of product objects to the default dataset and also writes a local file named amazon.search.result.json. Each item includes fields such as:

  • asin, title, url, thumbnail, images, price, currency, stars, review_count, brand_name, availability, description, categories, search_keyword

Example output (single record)

{
"asin": "B07XXXXX",
"title": "Wireless Earbuds β€” Example Model",
"url": "https://www.amazon.com/...","
"thumbnail": "https://...jpg",
"images": ["https://...jpg"],
"price": "$59.99",
"currency": "$",
"stars": 4.5,
"review_count": 1234,
"brand_name": "ExampleBrand",
"availability": "In Stock",
"description": "Key bullets...",
"categories": "Electronics > Headphones",
"search_keyword": "wireless earbuds"
}

πŸ”‘ Environment variables

  • <APIFY_TOKEN> β€” required when starting runs via API/CLI. Do not hardcode tokens; store them as secrets in your environment/Console.

The Actor itself does not require custom environment variables to run in Console. The file amazon.search.result.json is produced inside the run and its contents are pushed to the default dataset.


▢️ How to Run

Console

  • Paste input, click Run. Results available under Dataset after completion.

CLI

$apify run --actor <OWNER>/<ACTOR_NAME> --input input.example.json

API

  • Use the snippet above (Python). Use <APIFY_TOKEN> stored securely.

⏰ Scheduling & Webhooks

  • Schedule periodic runs directly in the Console (cron-style schedules available).
  • To automate post-run workflows, subscribe a webhook on run:finished to receive dataset IDs or download links.

πŸ•ΎοΈ Logs & Troubleshooting

  • If the run returns zero results, try broader keywords or confirm the search term is supported in the target marketplace.
  • If product pages fail intermittently, reduce concurrency or add delays between requests.
  • Check run logs in Console for page-timeout or selector changes β€” adjust keyword strategy accordingly.

Common fixes:

  • Increase concurrency for faster runs (careful: may trigger rate limits).
  • Lower concurrency or add retries/backoff if you see frequent fetch failures.

πŸ”’ Permissions & Storage Notes

  • Results are pushed to the Actor default dataset and written to a run-local JSON file named amazon.search.result.json.
  • Keep API tokens and proxy credentials in Console secrets β€” never in actor input plain text.

πŸ”Ÿ Changelog / Versioning

  • v1.0.0 β€” Initial release: keyword search + product enrichment; dataset push and local JSON output.

πŸ–Œ Notes / TODOs

  • TODO: Consider making headless and request_delay configurable via inputs (keeps Console simple). β€” Reason: expose throttling/stealth knobs for advanced users.
  • TODO: Consider proxy rotation for large-scale scraping.

🌍 Proxy Configuration

This Actor performs network requests. If you need to route traffic through proxies, use either the built-in Apify Proxy via the Console or supply custom proxy credentials.

Enable Apify Proxy (Console)

  • In the Actor configuration panel β†’ toggle Use Apify Proxy.

Custom proxy (example)

  • Supply proxy credentials as environment-level secrets and configure the run to export them to the Actor. Use the following placeholders in your run environment when needed:
HTTP_PROXY=<PROXY_USER:PASS@HOST:PORT>
HTTPS_PROXY=<PROXY_USER:PASS@HOST:PORT>

Notes

  • Store <PROXY_USER:PASS@HOST:PORT> as a secret in Console β€” never place credentials inside input JSON.
  • TODO: Consider proxy rotation for large-scale scraping.

πŸ“š References


πŸ€” What I inferred from main.py

  • The Actor accepts queries (single or array) and an optional concurrency value.
  • For each keyword it renders the search results page (headless browser) to gather initial hits, then fetches individual product pages (HTTP client) to enrich details.
  • Outputs are pushed to the default dataset and a run-local JSON file named amazon.search.result.json.
  • Built-in delays, retries, and concurrency controls are used to reduce failures. Some tuning may be necessary for large-scale runs.

input.example.json

{
"queries": [
"wireless earbuds",
"bluetooth speaker"
],
"concurrency": 8
}

CONFIG.md

Optional configuration notes

  • concurrency β€” lower values reduce request load and retry pressure.
  • queries β€” can be a single string or an array; use arrays for batch runs.
  • APIFY_TOKEN β€” always set as a secret in your environment when using CLI/API.

Placeholders

  • Replace secrets with <APIFY_TOKEN> and <PROXY_USER:PASS@HOST:PORT>.
  • Fields marked TODO: indicate inferred options that were not explicitly exposed as inputs in the source.