Amazon  Scraper - US, SG, CA, GB, AU avatar

Amazon Scraper - US, SG, CA, GB, AU

Pricing

$1.90 / 1,000 result items

Go to Apify Store
Amazon  Scraper - US, SG, CA, GB, AU

Amazon Scraper - US, SG, CA, GB, AU

Use this Amazon scraper to collect data based on URL and country from the Amazon website. Extract product information without using the Amazon API, including reviews, prices, descriptions, and Amazon Standard Identification Numbers (ASINs). Download data in various structured formats.

Pricing

$1.90 / 1,000 result items

Rating

0.0

(0)

Developer

kane liu

kane liu

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

9 hours ago

Last modified

Share

Amazon Search Results Collector - US, SG, CA, GB, AU

HTTP-based Amazon search-results scraper for US, SG, CA, GB, AU
with normalized dataset output, market-matched Apify residential proxy rules, and run/debug summaries.

markets billing proxy

Overview

This actor collects Amazon search result listings from these storefronts:

  • United States (US) → www.amazon.com
  • Singapore (SG) → www.amazon.sg
  • Canada (CA) → www.amazon.ca
  • United Kingdom (GB) → www.amazon.co.uk
  • Australia (AU) → www.amazon.com.au

The current implementation is an HTTP + HTML parser workflow:

  1. build the Amazon search URL for each keyword and page
  2. fetch the HTML with httpx
  3. detect and follow Amazon bm-verify meta-refresh pages when present
  4. parse search cards from HTML
  5. normalize rows, deduplicate by ASIN, push to dataset, and write summaries

This actor currently targets search-result pages only. It does not crawl product detail pages, seller pages, reviews, category trees, or ads APIs.

What this actor actually supports

Based on the current code, schemas, and tests, the actor supports:

  • search keywords via keywords
  • one selected market per run: US, SG, CA, GB, or AU
  • up to 5 pages per keyword
  • global output cap via maxItems
  • per-keyword cap via maxItemsPerKeyword
  • deduplication by asin
  • optional raw card HTML in dataset rows via includeRawHtml=true
  • optional debug metadata in key-value store via debug=true
  • local deterministic runs via fixtureHtmlPath
  • Apify Store event charging using result-item

Pricing

This actor is intended for Apify Store PAY_PER_EVENT billing.

Billing unit

  • Pricing model: PAY_PER_EVENT
  • Event name: result-item
  • Price: $0.0019 per item

What counts as a billable item

The runtime charges after output deduplication, using the number of dataset rows actually emitted.

In code, charging happens with:

  • event name: result-item
  • count: len(dataset_rows)

For local runs, the actor writes a CHARGE_SUMMARY.json record so you can verify the counted items.

Why use this actor

  • Search monitoring: track titles, ASINs, prices, ratings, and rank position.
  • Cross-market comparison: run the same keyword across US / SG / CA / GB / AU with one consistent schema.
  • Debuggable runs: inspect RUN_SUMMARY, INPUT_ECHO, CHARGE_SUMMARY, and optional debug entries.
  • Fixture-based QA: validate parsing offline with local HTML fixtures before remote runs.

Supported markets

MarketCountryHostCurrency code
USUnited Stateswww.amazon.comUSD
SGSingaporewww.amazon.sgSGD
CACanadawww.amazon.caCAD
GBUnited Kingdomwww.amazon.co.ukGBP
AUAustraliawww.amazon.com.auAUD

Input

Main input fields

FieldTypeRuntime behavior
keywordsstring[]Required. Empty/blank values are removed. Runtime allows up to 10 keywords.
marketstringOne of US, SG, CA, GB, AU. Runtime default is SG.
pagesintegerPages per keyword. Allowed range: 1-5. Default 1.
maxItemsintegerGlobal emitted-row cap after parsing/dedupe flow. Runtime range: 1-100. Default 40.
maxItemsPerKeywordintegerPer-keyword cap. Runtime range: 1-100. Default 40.
requestTimeoutSecs / timeoutSecsintegerRequest timeout. Runtime accepts either field name. Range: 5-180. Default 30.
userAgentstringOptional override for HTTP requests.
debugbooleanWhen true, writes per-keyword debug metadata to KVS.
includeRawHtmlbooleanWhen true, each dataset row may include rawHtml for the parsed card.
proxyConfigurationobjectApify proxy settings for remote runs. See proxy rules below.
fixtureHtmlPathstringOptional local HTML fixture path for deterministic offline runs.

Example input

{
"keywords": ["usb hub", "wireless mouse"],
"market": "US",
"pages": 2,
"maxItems": 20,
"maxItemsPerKeyword": 10,
"requestTimeoutSecs": 60,
"debug": true,
"includeRawHtml": false,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyCountry": "US"
}
}

Fixture-mode example

{
"keywords": ["phone case"],
"market": "SG",
"pages": 1,
"maxItems": 10,
"maxItemsPerKeyword": 10,
"fixtureHtmlPath": "fixtures/amazon_search_sample.html",
"debug": true,
"includeRawHtml": true
}

Output

The actor writes normalized rows to the Apify dataset.

Core fields

  • keyword
  • market
  • marketName
  • marketHost
  • country
  • page
  • rank
  • absoluteRank
  • itemId
  • asin
  • title
  • productUrl
  • searchUrl
  • sourceUrl
  • price
  • priceValue
  • currency
  • currencyCode
  • capturedAt
  • source

Additional fields currently produced

  • image
  • imageUrl
  • rating
  • ratingScore
  • ratingCount
  • reviewCount
  • sponsored
  • isSponsored
  • rawHtml when includeRawHtml=true

Example output

{
"market": "SG",
"marketName": "Singapore",
"marketHost": "www.amazon.sg",
"country": "SG",
"keyword": "phone case",
"page": 1,
"rank": 1,
"absoluteRank": 1,
"itemId": "B0TEST1234",
"asin": "B0TEST1234",
"title": "Sample Phone Case",
"price": "S$19.90",
"priceValue": 19.9,
"currency": "S$",
"currencyCode": "SGD",
"rating": 4.6,
"ratingScore": 4.6,
"ratingCount": 123,
"reviewCount": 123,
"image": "https://images.example.com/1.jpg",
"imageUrl": "https://images.example.com/1.jpg",
"productUrl": "https://www.amazon.sg/dp/B0TEST1234",
"searchUrl": "https://www.amazon.sg/s?k=phone+case&page=1",
"sourceUrl": "https://www.amazon.sg/dp/B0TEST1234",
"sponsored": false,
"isSponsored": false,
"capturedAt": "2026-06-27T01:41:39.679709+00:00",
"source": "amazonSearchHtml"
}

Key-value store artifacts

The actor may write these keys:

  • INPUT_ECHO - normalized public input snapshot
  • RUN_SUMMARY - run-level counters, duplicates, and error list
  • CHARGE_SUMMARY - local charge count record for result-item
  • ERROR_SUMMARY - top-level failure summary when the run fails
  • DEBUG_<keyword> - per-keyword page metadata when debug=true

RUN_SUMMARY fields

Current code writes:

  • market
  • keywordsRequested
  • keywordsSucceeded
  • keywordsFailed
  • pagesRequested
  • pagesSucceeded
  • rawItems
  • dedupedItems
  • duplicateRate
  • startedAt
  • finishedAt
  • errors
  • chargedItems
  • chargeEventName

Proxy rules

The runtime has strict proxy behavior for remote runs.

Remote Apify runs

For remote runs, the actor expects Apify Proxy residential routing:

  • proxy group is normalized to include RESIDENTIAL
  • proxy country is matched to the selected market country
  • proxyConfiguration.proxyUrls is rejected
  • proxyConfiguration.useApifyProxy=false is rejected
  • if apifyProxyCountry is provided, it must equal the market country

Examples:

  • US run -> proxy country US
  • SG run -> proxy country SG
  • CA run -> proxy country CA
  • GB run -> proxy country GB
  • AU run -> proxy country AU

Local / fixture runs

Proxy can be skipped when:

  • running with --local
  • using fixtureHtmlPath

In that case the actor runs with proxyMode: "none".

How it works

  1. Validate and normalize the input.
  2. Resolve market host, language, and currency metadata.
  3. Resolve proxy selection rules.
  4. Request the search page HTML.
  5. Follow Amazon bm-verify meta refresh when encountered.
  6. Parse div[data-component-type="s-search-result"][data-asin] cards.
  7. Extract ASIN, title, price, rating, review count, image, and URL fields.
  8. Deduplicate rows by asin.
  9. Push dataset rows and charge result-item events.
  10. Save run and debug summaries.

Local development

Install

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run tests

$PYTHONPATH=. .venv/bin/pytest -q

Run locally with fixture HTML

APIFY_LOCAL_STORAGE_DIR=./storage \
APIFY_INPUT_FILE=./remote_run_input_sg.json \
PYTHONPATH=. .venv/bin/python -m src.main --local

To make that local run deterministic, point fixtureHtmlPath to one of the HTML files in fixtures/.

Test and verification commands

These commands are present in the repository and are appropriate for verification:

Unit/integration tests

$PYTHONPATH=. .venv/bin/pytest -q

Single scenario gate

$python3 apify_actor_dynamic_test.py --config dynamic_test_config.json

Multi-scenario gate

$python3 apify_actor_dynamic_multiscenario.py --config dynamic_multiscenario_config.json

Quality gate

$python3 apify_actor_quality_gate.py --config quality_gate_config.json

Release gate

$python3 apify_actor_release_gate.py --config release_gate_config.json

Limitations

  • This actor only parses search-result HTML cards.
  • It does not visit product detail pages for extra enrichment.
  • It does not support markets outside US, SG, CA, GB, AU.
  • It depends on Amazon's current HTML structure for search cards.
  • If Amazon returns a captcha/block page, the run can fail with amazon_block_or_captcha.
  • If Amazon returns a page without parseable search cards, parsing can fail with amazon_no_search_cards or amazon_no_parseable_products.
  • maxItems and maxItemsPerKeyword are runtime-limited to 100, even though schema metadata may be more permissive.
  • Runtime keyword count is limited to 10.

Troubleshooting

Dataset is empty or run fails with no rows

Check:

  • keyword is valid and not blank after trimming
  • market is one of US, SG, CA, GB, AU
  • fixture HTML really contains Amazon search cards
  • RUN_SUMMARY or ERROR_SUMMARY for parse/block details

Remote run rejects proxy settings

Check:

  • useApifyProxy is enabled
  • proxyUrls is not used
  • apifyProxyCountry matches the selected market country
  • Apify Proxy group resolves to include RESIDENTIAL

Amazon returns block/captcha behavior

Look for errors such as:

  • amazon_block_or_captcha
  • failed fetch errors from httpx

Then try:

  • matching the market and proxy country correctly
  • using Apify residential proxy routing
  • reducing run size while validating with one keyword and one page first

Debugging parser behavior

Use:

  • fixtureHtmlPath for deterministic local parsing
  • includeRawHtml=true to inspect row-level card HTML
  • debug=true to save DEBUG_<keyword> page metadata