Amazon Product Scraper avatar
Amazon Product Scraper

Pricing

$5.99 / 1,000 results

Go to Apify Store
Amazon Product Scraper

Amazon Product Scraper

Turn raw Amazon pages into winning insights. Instantly collect ASINs, product titles, prices, images, reviews and availability at scale. Perfect for sellers, researchers and marketers who want faster decisions, smarter pricing and data that converts β€” clean output, ready for automation and growth.

Pricing

$5.99 / 1,000 results

Rating

5.0

(2)

Developer

Neuro Scraper

Neuro Scraper

Maintained by Community

Actor stats

1

Bookmarked

18

Total users

5

Monthly active users

14 days ago

Last modified

Share

One-line hero: Run a production-ready actor that searches Amazon by keywords and delivers clean product records β€” instant insights, secure by design.


πŸ“– Short summary

A plug-and-play Actor that performs keyword searches on Amazon, enriches the results with product page details, and exports a clean dataset ready for analysis. Designed for fast, repeatable runs in the Apify Console β€” no developer setup required.


πŸ’‘ Use cases / When to use

  • Market research: compare prices, reviews, and availability across search results.
  • Competitor monitoring: track product listings and price changes for a set of queries.
  • Lead collection for product feeds: gather thumbnails, product URLs, and metadata.
  • Quick audits: get a snapshot of top search results for given keywords.

⚑ Quick Start (Console β€” one-click)

  1. Open the Actor in Apify Console.
  2. Paste a JSON input containing queries (single string or array). See input.example.json for a minimal example.
  3. Click Run β€” the Actor runs and saves results to the default dataset and an output file.

βš™οΈ Quick Start (CLI + API)

CLI

$apify run --actor <OWNER>/<ACTOR_NAME> --input input.example.json

API (Python β€” apify-client)

from apify_client import ApifyClient
client = ApifyClient('<APIFY_TOKEN>')
run = client.actor_runs().start('<OWNER>/<ACTOR_NAME>', input={
'queries': ['wireless earbuds', 'bluetooth speaker'],
'concurrency': 8,
})
print('Started run:', run['id'])

Replace <APIFY_TOKEN> and <OWNER>/<ACTOR_NAME> with your values.


πŸ“ Inputs (fields & schema)

{
"queries": ["string"] ,
"concurrency": 8
}
  • queries β€” required. A single search string or an array of search keywords (e.g. "wireless earbuds" or ["wireless earbuds","bluetooth speaker"]).
  • concurrency β€” optional. Number of concurrent product page fetches. Defaults to 8.

βš™οΈ Configuration

πŸ”‘ NameπŸ“ Type❓ Requiredβš™οΈ DefaultπŸ“Œ Example🧠 Notes
queriesarray / stringβœ… YesNone["wireless earbuds"]One or more search keywords
concurrencyintegerβš™οΈ Optional84Controls concurrent product fetches
outputFilestringβš™οΈ Optionalamazon.search.result.json"amazon.search.result.json"Local file written in run (pushed to dataset)

Example Console setup: Paste ["wireless earbuds"] into the Input field and click Run Actor.


πŸ“„ Outputs (Dataset / KV examples)

The Actor pushes a JSON array of product objects to the default dataset and also writes a local file named amazon.search.result.json. Each item includes fields such as:

  • asin, title, url, thumbnail, images, price, currency, stars, review_count, brand_name, availability, description, categories, search_keyword

Example output (single record)

{
"asin": "B07XXXXX",
"title": "Wireless Earbuds β€” Example Model",
"url": "https://www.amazon.com/...","
"thumbnail": "https://...jpg",
"images": ["https://...jpg"],
"price": "$59.99",
"currency": "$",
"stars": 4.5,
"review_count": 1234,
"brand_name": "ExampleBrand",
"availability": "In Stock",
"description": "Key bullets...",
"categories": "Electronics > Headphones",
"search_keyword": "wireless earbuds"
}

πŸ”‘ Environment variables

  • <APIFY_TOKEN> β€” required when starting runs via API/CLI. Do not hardcode tokens; store them as secrets in your environment/Console.

The Actor itself does not require custom environment variables to run in Console. The file amazon.search.result.json is produced inside the run and its contents are pushed to the default dataset.


▢️ How to Run

Console

  • Paste input, click Run. Results available under Dataset after completion.

CLI

$apify run --actor <OWNER>/<ACTOR_NAME> --input input.example.json

API

  • Use the snippet above (Python). Use <APIFY_TOKEN> stored securely.

⏰ Scheduling & Webhooks

  • Schedule periodic runs directly in the Console (cron-style schedules available).
  • To automate post-run workflows, subscribe a webhook on run:finished to receive dataset IDs or download links.

πŸ•ΎοΈ Logs & Troubleshooting

  • If the run returns zero results, try broader keywords or confirm the search term is supported in the target marketplace.
  • If product pages fail intermittently, reduce concurrency or add delays between requests.
  • Check run logs in Console for page-timeout or selector changes β€” adjust keyword strategy accordingly.

Common fixes:

  • Increase concurrency for faster runs (careful: may trigger rate limits).
  • Lower concurrency or add retries/backoff if you see frequent fetch failures.

πŸ”’ Permissions & Storage Notes

  • Results are pushed to the Actor default dataset and written to a run-local JSON file named amazon.search.result.json.
  • Keep API tokens and proxy credentials in Console secrets β€” never in actor input plain text.

πŸ”Ÿ Changelog / Versioning

  • v1.0.0 β€” Initial release: keyword search + product enrichment; dataset push and local JSON output.

πŸ–Œ Notes / TODOs

  • TODO: Consider making headless and request_delay configurable via inputs (keeps Console simple). β€” Reason: expose throttling/stealth knobs for advanced users.
  • TODO: Consider proxy rotation for large-scale scraping.

🌍 Proxy Configuration

This Actor performs network requests. If you need to route traffic through proxies, use either the built-in Apify Proxy via the Console or supply custom proxy credentials.

Enable Apify Proxy (Console)

  • In the Actor configuration panel β†’ toggle Use Apify Proxy.

Custom proxy (example)

  • Supply proxy credentials as environment-level secrets and configure the run to export them to the Actor. Use the following placeholders in your run environment when needed:
HTTP_PROXY=<PROXY_USER:PASS@HOST:PORT>
HTTPS_PROXY=<PROXY_USER:PASS@HOST:PORT>

Notes

  • Store <PROXY_USER:PASS@HOST:PORT> as a secret in Console β€” never place credentials inside input JSON.
  • TODO: Consider proxy rotation for large-scale scraping.

πŸ“š References


πŸ€” What I inferred from main.py

  • The Actor accepts queries (single or array) and an optional concurrency value.
  • For each keyword it renders the search results page (headless browser) to gather initial hits, then fetches individual product pages (HTTP client) to enrich details.
  • Outputs are pushed to the default dataset and a run-local JSON file named amazon.search.result.json.
  • Built-in delays, retries, and concurrency controls are used to reduce failures. Some tuning may be necessary for large-scale runs.

input.example.json

{
"queries": [
"wireless earbuds",
"bluetooth speaker"
],
"concurrency": 8
}

CONFIG.md

Optional configuration notes

  • concurrency β€” lower values reduce request load and retry pressure.
  • queries β€” can be a single string or an array; use arrays for batch runs.
  • APIFY_TOKEN β€” always set as a secret in your environment when using CLI/API.

Placeholders

  • Replace secrets with <APIFY_TOKEN> and <PROXY_USER:PASS@HOST:PORT>.
  • Fields marked TODO: indicate inferred options that were not explicitly exposed as inputs in the source.