Amazon Product Scraper
Pricing
$0.50 / 1,000 results
Amazon Product Scraper
π **Amazon Product Scraper** β β‘ Instantly fetch product data from Amazon by keyword! Get π¦ titles, π² prices, β reviews, πΌοΈ images & more β all in seconds. Perfect for π market research, π§ trend tracking, and πΌ pricing analysis. **Fast, secure & ready to run!**
0.0 (0)
Pricing
$0.50 / 1,000 results
0
1
1
Last modified
9 hours ago
One-line hero: Run a production-ready actor that searches Amazon by keywords and delivers clean product records β instant insights, secure by design.
π Short summary
A plug-and-play Actor that performs keyword searches on Amazon, enriches the results with product page details, and exports a clean dataset ready for analysis. Designed for fast, repeatable runs in the Apify Console β no developer setup required.
π‘ Use cases / When to use
- Market research: compare prices, reviews, and availability across search results.
- Competitor monitoring: track product listings and price changes for a set of queries.
- Lead collection for product feeds: gather thumbnails, product URLs, and metadata.
- Quick audits: get a snapshot of top search results for given keywords.
β‘ Quick Start (Console β one-click)
- Open the Actor in Apify Console.
- Paste a JSON input containing
queries(single string or array). Seeinput.example.jsonfor a minimal example. - Click Run β the Actor runs and saves results to the default dataset and an output file.
βοΈ Quick Start (CLI + API)
CLI
$apify run --actor <OWNER>/<ACTOR_NAME> --input input.example.json
API (Python β apify-client)
from apify_client import ApifyClientclient = ApifyClient('<APIFY_TOKEN>')run = client.actor_runs().start('<OWNER>/<ACTOR_NAME>', input={'queries': ['wireless earbuds', 'bluetooth speaker'],'concurrency': 8,})print('Started run:', run['id'])
Replace
<APIFY_TOKEN>and<OWNER>/<ACTOR_NAME>with your values.
π Inputs (fields & schema)
{"queries": ["string"] ,"concurrency": 8}
queriesβ required. A single search string or an array of search keywords (e.g."wireless earbuds"or["wireless earbuds","bluetooth speaker"]).concurrencyβ optional. Number of concurrent product page fetches. Defaults to8.
βοΈ Configuration
| π Name | π Type | β Required | βοΈ Default | π Example | π§ Notes |
|---|---|---|---|---|---|
| queries | array / string | β Yes | None | ["wireless earbuds"] | One or more search keywords |
| concurrency | integer | βοΈ Optional | 8 | 4 | Controls concurrent product fetches |
| outputFile | string | βοΈ Optional | amazon.search.result.json | "amazon.search.result.json" | Local file written in run (pushed to dataset) |
Example Console setup: Paste ["wireless earbuds"] into the Input field and click Run Actor.
π Outputs (Dataset / KV examples)
The Actor pushes a JSON array of product objects to the default dataset and also writes a local file named amazon.search.result.json. Each item includes fields such as:
asin,title,url,thumbnail,images,price,currency,stars,review_count,brand_name,availability,description,categories,search_keyword
Example output (single record)
{"asin": "B07XXXXX","title": "Wireless Earbuds β Example Model","url": "https://www.amazon.com/...",""thumbnail": "https://...jpg","images": ["https://...jpg"],"price": "$59.99","currency": "$","stars": 4.5,"review_count": 1234,"brand_name": "ExampleBrand","availability": "In Stock","description": "Key bullets...","categories": "Electronics > Headphones","search_keyword": "wireless earbuds"}
π Environment variables
<APIFY_TOKEN>β required when starting runs via API/CLI. Do not hardcode tokens; store them as secrets in your environment/Console.
The Actor itself does not require custom environment variables to run in Console. The file
amazon.search.result.jsonis produced inside the run and its contents are pushed to the default dataset.
βΆοΈ How to Run
Console
- Paste input, click Run. Results available under Dataset after completion.
CLI
$apify run --actor <OWNER>/<ACTOR_NAME> --input input.example.json
API
- Use the snippet above (Python). Use
<APIFY_TOKEN>stored securely.
β° Scheduling & Webhooks
- Schedule periodic runs directly in the Console (cron-style schedules available).
- To automate post-run workflows, subscribe a webhook on run:finished to receive dataset IDs or download links.
πΎοΈ Logs & Troubleshooting
- If the run returns zero results, try broader keywords or confirm the search term is supported in the target marketplace.
- If product pages fail intermittently, reduce
concurrencyor add delays between requests. - Check run logs in Console for page-timeout or selector changes β adjust keyword strategy accordingly.
Common fixes:
- Increase
concurrencyfor faster runs (careful: may trigger rate limits). - Lower
concurrencyor add retries/backoff if you see frequent fetch failures.
π Permissions & Storage Notes
- Results are pushed to the Actor default dataset and written to a run-local JSON file named
amazon.search.result.json. - Keep API tokens and proxy credentials in Console secrets β never in actor input plain text.
π Changelog / Versioning
v1.0.0β Initial release: keyword search + product enrichment; dataset push and local JSON output.
π Notes / TODOs
- TODO: Consider making
headlessandrequest_delayconfigurable via inputs (keeps Console simple). β Reason: expose throttling/stealth knobs for advanced users. - TODO: Consider proxy rotation for large-scale scraping.
π Proxy Configuration
This Actor performs network requests. If you need to route traffic through proxies, use either the built-in Apify Proxy via the Console or supply custom proxy credentials.
Enable Apify Proxy (Console)
- In the Actor configuration panel β toggle Use Apify Proxy.
Custom proxy (example)
- Supply proxy credentials as environment-level secrets and configure the run to export them to the Actor. Use the following placeholders in your run environment when needed:
HTTP_PROXY=<PROXY_USER:PASS@HOST:PORT>HTTPS_PROXY=<PROXY_USER:PASS@HOST:PORT>
Notes
- Store
<PROXY_USER:PASS@HOST:PORT>as a secret in Console β never place credentials insideinputJSON. - TODO: Consider proxy rotation for large-scale scraping.
π References
- Apify Actors README guidelines β https://docs.apify.com/actors/README
- Apify Input/Output model β https://docs.apify.com/actors/input-output
- Apify CLI & API usage β https://docs.apify.com/tools/cli
π€ What I inferred from main.py
- The Actor accepts
queries(single or array) and an optionalconcurrencyvalue. - For each keyword it renders the search results page (headless browser) to gather initial hits, then fetches individual product pages (HTTP client) to enrich details.
- Outputs are pushed to the default dataset and a run-local JSON file named
amazon.search.result.json. - Built-in delays, retries, and concurrency controls are used to reduce failures. Some tuning may be necessary for large-scale runs.
input.example.json
{"queries": ["wireless earbuds","bluetooth speaker"],"concurrency": 8}
CONFIG.md
Optional configuration notes
concurrencyβ lower values reduce request load and retry pressure.queriesβ can be a single string or an array; use arrays for batch runs.APIFY_TOKENβ always set as a secret in your environment when using CLI/API.
Placeholders
- Replace secrets with
<APIFY_TOKEN>and<PROXY_USER:PASS@HOST:PORT>. - Fields marked
TODO:indicate inferred options that were not explicitly exposed as inputs in the source.
