Amazon Scraper - US, SG, CA, GB, AU
Pricing
$1.90 / 1,000 result items
Amazon Scraper - US, SG, CA, GB, AU
Use this Amazon scraper to collect data based on URL and country from the Amazon website. Extract product information without using the Amazon API, including reviews, prices, descriptions, and Amazon Standard Identification Numbers (ASINs). Download data in various structured formats.
Pricing
$1.90 / 1,000 result items
Rating
0.0
(0)
Developer
kane liu
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
9 hours ago
Last modified
Categories
Share
Amazon Search Results Collector - US, SG, CA, GB, AU
HTTP-based Amazon search-results scraper for US, SG, CA, GB, AU
with normalized dataset output, market-matched Apify residential proxy rules, and run/debug summaries.
Overview
This actor collects Amazon search result listings from these storefronts:
- United States (
US) →www.amazon.com - Singapore (
SG) →www.amazon.sg - Canada (
CA) →www.amazon.ca - United Kingdom (
GB) →www.amazon.co.uk - Australia (
AU) →www.amazon.com.au
The current implementation is an HTTP + HTML parser workflow:
- build the Amazon search URL for each keyword and page
- fetch the HTML with
httpx - detect and follow Amazon
bm-verifymeta-refresh pages when present - parse search cards from HTML
- normalize rows, deduplicate by ASIN, push to dataset, and write summaries
This actor currently targets search-result pages only. It does not crawl product detail pages, seller pages, reviews, category trees, or ads APIs.
What this actor actually supports
Based on the current code, schemas, and tests, the actor supports:
- search keywords via
keywords - one selected market per run:
US,SG,CA,GB, orAU - up to
5pages per keyword - global output cap via
maxItems - per-keyword cap via
maxItemsPerKeyword - deduplication by
asin - optional raw card HTML in dataset rows via
includeRawHtml=true - optional debug metadata in key-value store via
debug=true - local deterministic runs via
fixtureHtmlPath - Apify Store event charging using
result-item
Pricing
This actor is intended for Apify Store PAY_PER_EVENT billing.
Billing unit
- Pricing model:
PAY_PER_EVENT - Event name:
result-item - Price: $0.0019 per item
What counts as a billable item
The runtime charges after output deduplication, using the number of dataset rows actually emitted.
In code, charging happens with:
- event name:
result-item - count:
len(dataset_rows)
For local runs, the actor writes a CHARGE_SUMMARY.json record so you can verify the counted items.
Why use this actor
- Search monitoring: track titles, ASINs, prices, ratings, and rank position.
- Cross-market comparison: run the same keyword across US / SG / CA / GB / AU with one consistent schema.
- Debuggable runs: inspect
RUN_SUMMARY,INPUT_ECHO,CHARGE_SUMMARY, and optional debug entries. - Fixture-based QA: validate parsing offline with local HTML fixtures before remote runs.
Supported markets
| Market | Country | Host | Currency code |
|---|---|---|---|
US | United States | www.amazon.com | USD |
SG | Singapore | www.amazon.sg | SGD |
CA | Canada | www.amazon.ca | CAD |
GB | United Kingdom | www.amazon.co.uk | GBP |
AU | Australia | www.amazon.com.au | AUD |
Input
Main input fields
| Field | Type | Runtime behavior |
|---|---|---|
keywords | string[] | Required. Empty/blank values are removed. Runtime allows up to 10 keywords. |
market | string | One of US, SG, CA, GB, AU. Runtime default is SG. |
pages | integer | Pages per keyword. Allowed range: 1-5. Default 1. |
maxItems | integer | Global emitted-row cap after parsing/dedupe flow. Runtime range: 1-100. Default 40. |
maxItemsPerKeyword | integer | Per-keyword cap. Runtime range: 1-100. Default 40. |
requestTimeoutSecs / timeoutSecs | integer | Request timeout. Runtime accepts either field name. Range: 5-180. Default 30. |
userAgent | string | Optional override for HTTP requests. |
debug | boolean | When true, writes per-keyword debug metadata to KVS. |
includeRawHtml | boolean | When true, each dataset row may include rawHtml for the parsed card. |
proxyConfiguration | object | Apify proxy settings for remote runs. See proxy rules below. |
fixtureHtmlPath | string | Optional local HTML fixture path for deterministic offline runs. |
Example input
{"keywords": ["usb hub", "wireless mouse"],"market": "US","pages": 2,"maxItems": 20,"maxItemsPerKeyword": 10,"requestTimeoutSecs": 60,"debug": true,"includeRawHtml": false,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyCountry": "US"}}
Fixture-mode example
{"keywords": ["phone case"],"market": "SG","pages": 1,"maxItems": 10,"maxItemsPerKeyword": 10,"fixtureHtmlPath": "fixtures/amazon_search_sample.html","debug": true,"includeRawHtml": true}
Output
The actor writes normalized rows to the Apify dataset.
Core fields
keywordmarketmarketNamemarketHostcountrypagerankabsoluteRankitemIdasintitleproductUrlsearchUrlsourceUrlpricepriceValuecurrencycurrencyCodecapturedAtsource
Additional fields currently produced
imageimageUrlratingratingScoreratingCountreviewCountsponsoredisSponsoredrawHtmlwhenincludeRawHtml=true
Example output
{"market": "SG","marketName": "Singapore","marketHost": "www.amazon.sg","country": "SG","keyword": "phone case","page": 1,"rank": 1,"absoluteRank": 1,"itemId": "B0TEST1234","asin": "B0TEST1234","title": "Sample Phone Case","price": "S$19.90","priceValue": 19.9,"currency": "S$","currencyCode": "SGD","rating": 4.6,"ratingScore": 4.6,"ratingCount": 123,"reviewCount": 123,"image": "https://images.example.com/1.jpg","imageUrl": "https://images.example.com/1.jpg","productUrl": "https://www.amazon.sg/dp/B0TEST1234","searchUrl": "https://www.amazon.sg/s?k=phone+case&page=1","sourceUrl": "https://www.amazon.sg/dp/B0TEST1234","sponsored": false,"isSponsored": false,"capturedAt": "2026-06-27T01:41:39.679709+00:00","source": "amazonSearchHtml"}
Key-value store artifacts
The actor may write these keys:
INPUT_ECHO- normalized public input snapshotRUN_SUMMARY- run-level counters, duplicates, and error listCHARGE_SUMMARY- local charge count record forresult-itemERROR_SUMMARY- top-level failure summary when the run failsDEBUG_<keyword>- per-keyword page metadata whendebug=true
RUN_SUMMARY fields
Current code writes:
marketkeywordsRequestedkeywordsSucceededkeywordsFailedpagesRequestedpagesSucceededrawItemsdedupedItemsduplicateRatestartedAtfinishedAterrorschargedItemschargeEventName
Proxy rules
The runtime has strict proxy behavior for remote runs.
Remote Apify runs
For remote runs, the actor expects Apify Proxy residential routing:
- proxy group is normalized to include
RESIDENTIAL - proxy country is matched to the selected market country
proxyConfiguration.proxyUrlsis rejectedproxyConfiguration.useApifyProxy=falseis rejected- if
apifyProxyCountryis provided, it must equal the market country
Examples:
USrun -> proxy countryUSSGrun -> proxy countrySGCArun -> proxy countryCAGBrun -> proxy countryGBAUrun -> proxy countryAU
Local / fixture runs
Proxy can be skipped when:
- running with
--local - using
fixtureHtmlPath
In that case the actor runs with proxyMode: "none".
How it works
- Validate and normalize the input.
- Resolve market host, language, and currency metadata.
- Resolve proxy selection rules.
- Request the search page HTML.
- Follow Amazon
bm-verifymeta refresh when encountered. - Parse
div[data-component-type="s-search-result"][data-asin]cards. - Extract ASIN, title, price, rating, review count, image, and URL fields.
- Deduplicate rows by
asin. - Push dataset rows and charge
result-itemevents. - Save run and debug summaries.
Local development
Install
python3 -m venv .venvsource .venv/bin/activatepip install -r requirements.txt
Run tests
$PYTHONPATH=. .venv/bin/pytest -q
Run locally with fixture HTML
APIFY_LOCAL_STORAGE_DIR=./storage \APIFY_INPUT_FILE=./remote_run_input_sg.json \PYTHONPATH=. .venv/bin/python -m src.main --local
To make that local run deterministic, point fixtureHtmlPath to one of the HTML files in fixtures/.
Test and verification commands
These commands are present in the repository and are appropriate for verification:
Unit/integration tests
$PYTHONPATH=. .venv/bin/pytest -q
Single scenario gate
$python3 apify_actor_dynamic_test.py --config dynamic_test_config.json
Multi-scenario gate
$python3 apify_actor_dynamic_multiscenario.py --config dynamic_multiscenario_config.json
Quality gate
$python3 apify_actor_quality_gate.py --config quality_gate_config.json
Release gate
$python3 apify_actor_release_gate.py --config release_gate_config.json
Limitations
- This actor only parses search-result HTML cards.
- It does not visit product detail pages for extra enrichment.
- It does not support markets outside
US,SG,CA,GB,AU. - It depends on Amazon's current HTML structure for search cards.
- If Amazon returns a captcha/block page, the run can fail with
amazon_block_or_captcha. - If Amazon returns a page without parseable search cards, parsing can fail with
amazon_no_search_cardsoramazon_no_parseable_products. maxItemsandmaxItemsPerKeywordare runtime-limited to100, even though schema metadata may be more permissive.- Runtime keyword count is limited to
10.
Troubleshooting
Dataset is empty or run fails with no rows
Check:
- keyword is valid and not blank after trimming
marketis one ofUS,SG,CA,GB,AU- fixture HTML really contains Amazon search cards
RUN_SUMMARYorERROR_SUMMARYfor parse/block details
Remote run rejects proxy settings
Check:
useApifyProxyis enabledproxyUrlsis not usedapifyProxyCountrymatches the selected market country- Apify Proxy group resolves to include
RESIDENTIAL
Amazon returns block/captcha behavior
Look for errors such as:
amazon_block_or_captcha- failed fetch errors from
httpx
Then try:
- matching the market and proxy country correctly
- using Apify residential proxy routing
- reducing run size while validating with one keyword and one page first
Debugging parser behavior
Use:
fixtureHtmlPathfor deterministic local parsingincludeRawHtml=trueto inspect row-level card HTMLdebug=trueto saveDEBUG_<keyword>page metadata