PricePulse AI — Cross-Marketplace Product Intelligence
Pricing
from $40.00 / 1,000 results
PricePulse AI — Cross-Marketplace Product Intelligence
Cross-marketplace product intelligence: Amazon + eBay + Walmart bundled in one $0.04 snapshot. JSON-LD fallback covers Target/Best Buy/Etsy/Shopify. Change detection, webhook alerts, vs_reference compareTo, MCP-friendly.
Pricing
from $40.00 / 1,000 results
Rating
0.0
(0)
Developer
AIDevs
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
1
Monthly active users
17 days ago
Last modified
Categories
Share
One call, one product_intel_snapshot per product: aligned prices across Amazon + eBay + Walmart (plus any Schema.org storefront via JSON-LD fallback), change flags vs. your baseline, review summaries, opportunity insights, optional webhook alerts on significant price drops, and vs_reference comparison against your product. MCP-ready for AI agents. Pay per snapshot, not per row.
Disclaimer. Output is research material generated by automated extraction from public marketplace pages — NOT authoritative pricing, availability, or rating information. Verify every fact against the cited source URL before pricing decisions or customer-facing use. See Terms of Use at the bottom.
What's new in v0.2 — webhook alerts ($0.005/dispatch), JSON-LD universal fallback (Target, Best Buy, Etsy, Shopify, any storefront with Schema.org Product),
reference_label/vs_referenceblock,dry_runpreview mode, Apify Residential proxy support, Amazon 2-retry backoff, Amazon locale support, variants + image-gallery extraction, andkeyword_researchun-gated with strict caps. SeeCHANGELOG.mdfor details.
Keywords this actor ranks for: cross-marketplace price tracker, multi-marketplace product intelligence, Amazon eBay Walmart price compare, competitor price tracker, MAP enforcement, Buy Box monitor, MCP price intelligence, AI agent product compare, change-detection scraper, price-spread analyzer, brand portfolio monitor, restock alerts, opportunity insights actor.
Why PricePulse AI
| Other product / price scrapers | PricePulse AI |
|---|---|
| One marketplace per actor — chain three to compare | One run, one snapshot, three marketplaces aligned |
| One row per marketplace per product (you do the join) | One record per logical product — already grouped |
| Raw price strings, mixed currencies | Normalized {price, currency, is_on_sale} from every site |
| No change detection — you store history yourself | Built-in baseline diffing with significant flag |
| No analysis layer — just data | Opportunity insights: positioning, undercut alerts, spread % |
| Per-row billing scales with marketplace count | One snapshot = one charge, regardless of marketplaces |
| Fabricates fields when scrape fails | Never guesses — surfaces risk_flags, returns candidates_only |
| LLM-locked or LLM-required | Rule-based by default, optional Claude Haiku enrichment |
| Awkward for MCP / AI agents | MCP-first schema: short enums, fixed output, stamped event name |
When to use
- Cross-marketplace tracking. You sell on Amazon and Walmart and want one record per SKU per day showing where you stand on each.
- Competitor benchmarking. You name 3–5 competitor ASINs and want a daily who-is-undercutting-whom report.
- MAP enforcement. You set a price floor; you want flagged any site that drops below it.
- Brand portfolio health. You ship 25 SKUs; you want a single dataset row per SKU per day across all marketplaces.
- Sale event monitoring. Black Friday, Prime Day, Cyber Monday — hourly cadence with
change_detection_baseline_keyfor live deltas. - MCP / AI agent workflows. Claude or GPT calls this every N minutes; gets back a structured
product_intel_snapshotready to reason over.
When NOT to use
- Full review-text extraction (per-review author + body + helpful votes) → not in v1. We compute themes + rating distribution; full reviews need a dedicated reviews scraper.
- JavaScript-rendered storefronts only (e.g., some niche Shopify themes) → v1 is HTTP-only. Pages exposing data via JSON-LD or server-rendered HTML work; pure SPAs do not.
- Locale-specific Amazon parsing (UK, DE, JP) → v1 targets
amazon.comselectors. Other locales may parse partially. - Single-marketplace, max-volume scraping at $0.005/row → use a commodity Amazon-only scraper. We're priced for the joined, enriched, change-detected use case.
Quick start — 6 recipes that cover 90% of real use
1. Track your main product on Amazon (cheapest, fastest)
{"mode": "product_list","products": [{"label": "my_main_chair", "asin": "B08N5WRWNW"}],"marketplaces": ["amazon"]}
→ 1 snapshot, $0.04.
2. Cross-marketplace snapshot — your product on all 3 sites
{"mode": "product_list","products": [{"label": "my_main_chair", "asin": "B08N5WRWNW"}],"marketplaces": ["amazon", "ebay", "walmart"],"include_reviews_summary": true}
→ Still 1 snapshot, $0.04. (Marketplaces are bundled, not billed separately.)
3. Daily competitor digest with change detection
{"mode": "product_list","products": [{"label": "my_chair", "asin": "B08N5WRWNW"},{"label": "competitor_A", "asin": "B09ZZZ9999"},{"label": "competitor_B", "asin": "B07YYY0000"}],"marketplaces": ["amazon", "walmart"],"change_detection_baseline_key": "daily_chairs"}
→ 3 snapshots = $0.12/day. Each snapshot carries changes.price_change_flags vs. yesterday's run.
4. Brand portfolio (full SKU sweep)
See examples/example6_brand_monitoring.json — 5 SKUs × 3 marketplaces = 5 snapshots = $0.20/run.
5. MAP enforcement (price-floor watch)
See examples/example7_map_enforcement.json — every sites[].price below your floor is a violation. Pair with Apify's webhook integration to alert legal.
6. AI agent quick comparison via MCP
{"mode": "product_list","products": [{"label": "user_question_target", "asin": "B08N5WRWNW"}],"marketplaces": ["amazon", "ebay", "walmart"],"include_reviews_summary": true}
→ The agent uses markdown_report directly as a reply or pulls opportunity_insights.notes for a TL;DR.
More: see the 10 ready-to-run input JSONs in examples/.
Built for
| Role | What you do with it |
|---|---|
| Amazon / FBA seller | Daily check on your top 5 SKUs across Amazon + Walmart. priceDelta against competitors, plus opportunity notes you can paste into a Slack channel. |
| Brand manager | MAP enforcement: detect resellers selling below your minimum price. Cross-marketplace coverage means you catch eBay-only violators that Amazon-only tools miss. |
| E-commerce ops lead | Brand portfolio health — 25-SKU sweep once a day. Single dataset row per SKU per day for direct BI ingestion. |
| Pricing analyst | Price-spread analysis: which marketplace is the cheapest right now, by how much, and is the spread widening or closing? |
| Dropshipper / arbitrage | Identify cross-marketplace pricing arbitrage. Buy on the marketplace with the lowest sites[].price, list on the one with the highest. |
| Agency | Run on behalf of a client portfolio. One snapshot per client SKU; reports auto-derive from markdown_report. |
| AI agent / MCP tool builder | Plug into Claude Desktop / Cursor / any MCP client. The fixed output shape + stamped event name mean agents discover and call it without prompting acrobatics. |
| BI / data team | Pipe snapshots into Snowflake/BigQuery. Each record has stable group_key, request_context.finished_at, and JSON-friendly nested shape. |
Typical costs (worked examples)
| Scenario | Snapshots | Cost per run | Cadence | Monthly cost |
|---|---|---|---|---|
| 1 product, 1 marketplace | 1 | $0.04 | Daily | ~$1.20/mo |
| 5 products, 3 marketplaces (brand portfolio) | 5 | $0.20 | Daily | ~$6/mo |
| 25 products (max), 3 marketplaces | 25 | $1.00 | Daily | ~$30/mo |
| 10 competitor ASINs hourly during Black Friday week | 10 | $0.40 | Hourly × 7 days | ~$67/week |
Failed validation / Amazon block (candidates_only) | 0 | $0.00 | — | $0.00 |
Compare: dedicated price-tracking SaaS (Keepa, Prisync, etc.) typically charge $19–$99/month per tracked product. At 25 products that's $475–$2,475/month. PricePulse AI gives you a raw cross-marketplace data feed for $30/month at the same volume — you keep your data, you control the cadence, you own the analysis layer.
How it works
flowchart LRA[products and/or URLs<br/>+ marketplaces] --> B[Pydantic input validation<br/>bound lists, dedupe, mode rules]B --> C[Per-host robots.txt + rate limit<br/>identifying User-Agent]C --> D[Parallel marketplace fetch<br/>httpx + BeautifulSoup]D --> E[Normalize each site<br/>5-anchor extraction confidence]E --> F[Aggregate metrics<br/>min/avg/max, spread, blended rating]F --> G{baseline key?}G -->|yes| H[Diff vs. stored baseline<br/>price/availability/rank flags]G -->|no| I[skip]H --> J[Opportunity insights<br/>rule-based notes]I --> JJ --> K{LLM enrichment?}K -->|anthropic + key| L[Claude Haiku<br/>review summary]K -->|none| M[Rule-based summary]L --> N[Stamp billing fields<br/>chargeable_event_count, paid_event_name]M --> NN --> O[Push chargeable records to dataset<br/>Always set OUTPUT key]O --> P[(JSON + markdown + one-pager)]
Five-step short version:
- Validate input against a strict Pydantic schema — bail out cleanly on bad input, never charge for invalid runs.
- Fetch the relevant marketplace pages concurrently.
httpx+BeautifulSoup, polite per-host rate limit (0.6s minimum interval),robots.txthonoured. - Normalize each page into the canonical
SiteMetricsshape. Per-siteextraction_confidence= (anchors parsed) / 5. - Aggregate + analyze: compute cross-site stats, diff against your stored baseline (if any), derive reviews summary + opportunity insights.
- Stamp + push: chargeable records (
output_status.code in {ok, partial}) land in the dataset; the full payload always lands inOUTPUTfor observability — even when nothing was chargeable.
Sample output (truncated for readability)
{"label": "my_main_chair","group_key": "my_main_chair","core_profile": {"canonical_title": "ErgoPlus Ergonomic Office Chair","canonical_brand": "ErgoPlus","category": "Office Products > Chairs","image_url": "https://m.media-amazon.com/images/I/example.jpg"},"sites": [{"marketplace": "amazon","url": "https://www.amazon.com/dp/B08N5WRWNW","currency": "USD", "price": 199.99, "list_price": 249.99, "is_on_sale": true,"availability": "in_stock", "buy_box_seller_type": "amazon","rating": 4.5, "review_count": 1243, "bestseller_rank": 12,"extraction_confidence": 0.92},{"marketplace": "walmart","url": "https://www.walmart.com/ip/123456789","currency": "USD", "price": 189.00, "list_price": 219.00, "is_on_sale": true,"availability": "in_stock","rating": 4.3, "review_count": 578, "extraction_confidence": 0.85}],"aggregated_metrics": {"min_price": 189.0, "max_price": 199.99, "avg_price": 194.5,"price_spread_pct": 5.8, "avg_rating": 4.4, "total_review_count": 1821,"marketplace_count": 2},"changes": {"price_change_flags": [{"marketplace": "amazon", "direction": "decrease","absolute": -20.0, "relative_pct": -9.1, "significant": true}],"baseline_age_days": 2},"reviews_summary": {"summary_text": "Average rating across 2 marketplaces is 4.4 over 1,821 reviews. Recurring positives: comfort, support. Recurring concerns: armrests.","top_positive_themes": ["comfort", "support"],"top_negative_themes": ["armrests"],"enrichment_source": "rule_based"},"opportunity_insights": {"positioning": "mid_price_premium_quality","notes": ["Walmart currently has the lowest listed price (USD 189.00).","On sale on amazon, walmart — check timing.","Cross-marketplace price spread is 5.8% — possible arbitrage signal."]},"output_status": {"code": "ok"},"confidence": {"overall": 0.88, "discovery": 1.0, "extraction": 0.88, "reviews": 0.7, "insights": 0.85},"chargeable_event_count": 1,"paid_event_name": "product_intel_snapshot","markdown_report": "> **Disclaimer.** ...\n\n# ErgoPlus Ergonomic Office Chair\n...","one_pager_text": "PRODUCT INTEL — ErgoPlus Ergonomic Office Chair\n..."}
The markdown_report and one_pager_text are derived from the JSON; embed them straight into Slack, Notion, or an LLM prompt.
How to call it
From the Apify console (no code)
- Open the actor page → Try for free.
- Paste ASINs / URLs into
productsin the input form. - Pick
marketplaces(defaults to["amazon"]). - Save & Start. Each input product becomes one dataset record.
From an MCP-enabled LLM (Claude Desktop, Cursor, etc.)
The actor's input schema, output shape, and tool description are written for MCP auto-discovery. Sample agent prompt:
"Use PricePulse AI to compare my product
B08N5WRWNWagainst competitorsB09ZZZ9999andB07YYY0000on Amazon and Walmart. Tell me where I'm being undercut and by how much."
Programmatically — Python
from apify_client import ApifyClientimport osclient = ApifyClient(os.environ["APIFY_TOKEN"])run = client.actor("entranced_gelato/price-pulse-ai").call(run_input={"mode": "product_list","products": [{"label": "my_chair", "asin": "B08N5WRWNW"},{"label": "competitor_A", "asin": "B09ZZZ9999"},],"marketplaces": ["amazon", "walmart"],"include_reviews_summary": True,})for snap in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"{snap['label']}: min ${snap['aggregated_metrics']['min_price']:.2f} "f"across {snap['aggregated_metrics']['marketplace_count']} sites")for note in snap["opportunity_insights"]["notes"]:print(f" - {note}")
Programmatically — Node.js / TypeScript
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: process.env.APIFY_TOKEN });const run = await client.actor('entranced_gelato/price-pulse-ai').call({mode: 'product_list',products: [{ label: 'my_chair', asin: 'B08N5WRWNW' },{ label: 'competitor_A', asin: 'B09ZZZ9999' },],marketplaces: ['amazon', 'walmart'],});const { items } = await client.dataset(run.defaultDatasetId).listItems();const undercut = items.filter(s =>s.changes?.price_change_flags?.some(f => f.direction === 'decrease' && f.significant));console.log(`${undercut.length} product(s) saw a significant price decrease since baseline.`);
Programmatically — curl
curl -X POST -H "Content-Type: application/json" \-d '{"mode":"product_list","products":[{"label":"my_chair","asin":"B08N5WRWNW"}],"marketplaces":["amazon","walmart"]}' \"https://api.apify.com/v2/acts/entranced_gelato~price-pulse-ai/run-sync-get-dataset-items?token=$APIFY_TOKEN"
Scheduling for time-series
Pair with Apify Scheduler to run hourly or daily. Each snapshot stamps request_context.finished_at, so the dataset becomes a price-history database queryable via the Apify dataset API:
# Pull last 24h of my-chair snapshotscurl "https://api.apify.com/v2/datasets/<DATASET_ID>/items?clean=true&desc=true&limit=24"
Input reference
| Field | Type | Default | What it does |
|---|---|---|---|
mode | enum | "product_list" | product_list for explicit ASIN/URL lists, keyword_research to auto-discover top products. |
products | array of {label, asin, urls} | [] | One entry per group to track. asin OR at least one urls[] required per entry. label is a stable key for baseline diffing. |
marketplaces | array | ["amazon"] | Subset of ["amazon", "ebay", "walmart"]. Bundled into the same snapshot — adding sites does NOT add cost. URLs on other hosts (Target, Best Buy, Etsy, Shopify, …) route through the JSON-LD universal parser automatically. |
reference_label | string | null | v0.2. Mark one product as the reference; every other snapshot gets vs_reference: {price_delta, is_undercut, is_lowest_in_run}. |
alert_webhook_url | string | null | v0.2. HTTPS endpoint; POSTs on significant=true price changes. Each dispatch is one webhook_alert_dispatched event at $0.005. |
alert_undercut_pct | int | 5 | v0.2. Min relative price change (%) that triggers a webhook alert. |
use_apify_proxy | bool | false | v0.2. Route fetches through Apify Residential proxy (reduces Amazon CAPTCHA rate). Proxy bandwidth billed to your account by Apify. |
dry_run | bool | false | v0.2. Build snapshots, set OUTPUT, do NOT push to dataset, do NOT charge. Preview the shape before committing. |
amazon_locale | enum | "us" | v0.2. Locale for ASIN-derived URLs (us / uk / de / fr / jp / in / ca / au). URLs always override. |
change_detection_baseline_key | string | null | KV key for diffing — first run writes the baseline; later runs diff against it and overwrite. |
include_reviews_summary | bool | true | Attach a reviews_summary block (themes + rating distribution). |
include_history | bool | false | Attach a recent 30-day history block if prior runs stored in KV. |
enrichment_model | enum | "none" | "none" (rule-based) or "anthropic" (Claude Haiku). Falls back to rule-based on any LLM failure. |
anthropic_api_key | string (secret) | — | Required when enrichment_model = "anthropic". Stored as Apify secret input, never echoed. |
max_products_per_marketplace | int | 10 | v0.2. Cap for keyword_research mode (1–50). Total snapshots also bounded at 25. |
keyword | string | — | Search keyword for keyword_research mode. |
user_notes | string | null | Free-form context, echoed back in request_context. Useful for tagging runs in BI. |
Output reference
| Field | Description |
|---|---|
request_context | Echo of input + started_at / finished_at. |
label, group_key | Human label + stable key for baseline lookups. |
query_inputs | {asin, input_urls, keyword, resolved_marketplaces}. |
core_profile | {canonical_title, canonical_brand, category, image_url} — unified across sites. |
sites[] | Per-marketplace normalized metrics + per-site extraction_confidence. |
aggregated_metrics | {min_price, max_price, avg_price, price_spread_pct, avg_rating, total_review_count, marketplace_count}. |
changes | {price_change_flags[], availability_changes[], rank_changes[], baseline_age_days} vs. baseline. |
reviews_summary | {summary_text, top_positive_themes, top_negative_themes, rating_distribution, enrichment_source}. |
opportunity_insights | {positioning, notes[]} — rule-based, 1–5 short notes. |
confidence | {overall, discovery, extraction, reviews, insights} in [0, 1]. |
sources[] | Every URL fetched + status. Citation trail. |
risk_flags[] | Data-quality warnings (info / warn / error). |
output_status | {code, reason} — drives billing (ok/partial charged; candidates_only/failed not). |
disclaimer | Legal disclaimer; rendered atop every derived format. |
chargeable_event_count, paid_event_name | Billing transparency. |
markdown_report, one_pager_text | Embed-ready derived formats. |
Trust contract — what this actor will / won't do
- Never fabricates prices, ratings, review counts, or stock state. If a field can't be parsed, it is
null. - Always cites every URL it fetched in
sources[]. - Distinguishes rule-based vs. LLM-derived enrichment via
reviews_summary.enrichment_source. - Honors
robots.txtfor every host; surfaces blocks asrisk_flagentries. - Identifies itself with a clear
User-Agent; no spoofing; per-host minimum interval of 0.6s. - Returns
candidates_only(non-chargeable) rather than guessing when nothing resolves.
Pricing
$0.04 per product_intel_snapshot plus $0.005 per webhook_alert_dispatched when alerts are configured.
- One snapshot = one fully processed product group, regardless of how many marketplaces you queried
- Non-chargeable runs (
candidates_only,failed, validation errors,dry_run: true) push nothing to the dataset - 50-result free trial for new users
- Plus standard Apify platform compute & proxy (a fraction of a cent per run; paid by you to Apify directly)
"What will it cost me?" — quick calculator
| Your use case | Snapshots per day | Webhook alerts/day | Cost/day | Cost/month |
|---|---|---|---|---|
| Track 1 own product (Amazon-only) | 1 | 0 | $0.04 | ~$1.20 |
| Track 5 SKUs across Amazon + Walmart, no alerts | 5 | 0 | $0.20 | ~$6 |
| Brand portfolio: 25 SKUs × 3 marketplaces, daily | 25 | 1–3 | $1.00–$1.02 | ~$30 |
| Reseller arbitrage: 10 products, hourly during sale week | 240 | 5–10 | $9.65 | ~$67/week |
| Heavy: 25 SKUs hourly + 5 alerts/day | 600 | 5 | $24.03 | ~$721 |
Versus the SaaS alternative. Keepa Premium charges ~$19/month per tracked SKU on its mid-tier plan (Amazon-only, no eBay/Walmart). At 25 SKUs = $475/month vs. PricePulse AI's $30/month for the same coverage and the additional eBay/Walmart layer. Even at the heavy-use tier ($721/month for hourly tracking), you're below Keepa's $1,200/mo equivalent and you own the raw data feed.
Plug it into your stack — 5-minute integrations
Slack alerts on significant price drops
Set alert_webhook_url to your Slack incoming-webhook URL. Done. Each significant=true price change POSTs to Slack with the product label, marketplace, and delta:
{"alert_webhook_url": "https://hooks.slack.com/services/T00000000/B00000000/your-token","alert_undercut_pct": 5}
Webhook payload Slack receives:
{"type": "price-significant-change","label": "my_main_chair","marketplace": "amazon","direction": "decrease","relative_pct": -9.1,"current_price": 199.99,"currency": "USD","baseline_age_days": 1,"product_title": "ErgoPlus Ergonomic Office Chair","snapshot_url": "https://www.amazon.com/dp/B08N5WRWNW"}
For Slack's pretty-formatted alerts, route through a workflow that re-templates the payload (Apify also supports a native Slack integration if you want to fire on every successful run instead).
Google Sheets price tracker
Pair PricePulse AI with the Apify Google Sheets integration. One row per snapshot, appended on every run. Recommended sheet columns (use the pricing_detail dataset view): label, sites[0].marketplace, sites[0].price, aggregated_metrics.min_price, aggregated_metrics.avg_price, vs_reference.price_delta_pct, output_status.code, request_context.finished_at. Schedule the actor daily; the sheet becomes a free price-history database.
n8n / Make / Zapier
Apify exposes runs via webhooks and the REST API; both n8n and Make have native Apify nodes. Recommended flow:
- Trigger: Apify run-finished webhook (Apify pushes when an actor run finishes)
- Filter: keep only items where
vs_reference.is_undercut == trueANDvs_reference.price_delta_pct < -5 - Action: post to your CRM / send an email / update a Notion database
Direct API call (curl)
curl -X POST -H "Content-Type: application/json" \-d '{"mode":"product_list","products":[{"label":"my_chair","asin":"B08N5WRWNW"}],"marketplaces":["amazon","walmart"]}' \"https://api.apify.com/v2/acts/<your-username>~price-pulse-ai/run-sync-get-dataset-items?token=$APIFY_TOKEN"
Roadmap
We treat the roadmap as a contract: anything below 'v1.0' may be revised; v1.0 fields are frozen.
| Version | Status | Headlines |
|---|---|---|
| v0.1.0 | shipped 2026-05-22 | Initial release — cross-marketplace bundle ($0.04), Amazon/eBay/Walmart parsers, change detection, opportunity insights, rule-based reviews. 94 tests. |
| v0.2.0 | shipped 2026-05-22 | JSON-LD universal fallback, webhook alerts ($0.005/dispatch), vs_reference compareTo, dry_run, Apify Residential proxy, Amazon 8-locale + retry/backoff, variants + image gallery, keyword_research un-gated. 165+ tests. |
| v0.3.0 | Q3 2026 (planned) | Dedicated Target + Best Buy parsers (richer than JSON-LD); BSR history aggregation across marketplaces; per-variant price tracking; native MCP server packaging; webhook retry-with-backoff. |
| v0.4.0 | Q4 2026 (planned) | Full review-text extraction (with privacy-safe author redaction); image-difference detection between snapshots; Mercado Libre + Rakuten dedicated parsers. |
| v1.0.0 | TBD | Field-stable spec freeze. Breaking changes only behind a schema_version input. Backed by 12 months of v0.x telemetry. |
Semver policy: breaking changes to the output JSON shape only happen on a major (v1.0 → v2.0). Minor versions add fields; patch versions fix bugs without changing the shape. Watch CHANGELOG.md for every release.
FAQ
Q: I asked for 3 marketplaces — why is it only 1 snapshot? A: A snapshot is one record per logical product group, not per marketplace. We bundle all marketplaces for a given product into one snapshot. Charge is $0.04 regardless of marketplace count.
Q: My run returned candidates_only for an Amazon ASIN. Am I billed?
A: No. Amazon serves bot-detection pages to non-residential traffic and that fraction of runs can't be parsed. We mark them candidates_only and don't push to the dataset, so you're not charged.
Q: How do I track changes over time?
A: Pass any string as change_detection_baseline_key. First run writes the baseline; subsequent runs read it, compute price/availability/rank deltas, then overwrite with the new state.
Q: Can I use Claude Haiku for richer review summaries?
A: Yes — set enrichment_model = "anthropic" and supply anthropic_api_key (it's stored as an Apify secret, never echoed in output). Falls back to rule-based on any LLM failure.
Q: Does this work for Amazon UK / .de / .co.jp?
A: v0.2 supports 8 Amazon locales via the amazon_locale input (us / uk / de / fr / jp / in / ca / au). Pass an explicit URL to override. JS-only sub-domains may still parse partially.
Q: Does this scrape full customer reviews?
A: Not in v0.2 — we compute themes from rating distributions and (with enrichment_model: "anthropic") a small sample of public review snippets. No reviewer names or PII retained. Full review-text extraction is on the v0.4 roadmap.
Q: Is keyword research available?
A: Yes in v0.2 — set mode: "keyword_research" + keyword: "<search term>". Capped at 10 products per marketplace (configurable up to 50) AND 25 total snapshots per run for billing predictability.
Q: Can I run 1,000 products in one run? A: No — capped at 25 product groups per run for reliability and to keep your bill predictable. For larger universes, run multiple actor calls (each $0.04 × N).
Q: How accurate is the price field?
A: Prices come from server-rendered HTML selectors and JSON-LD blocks. Currency derives from the listing or symbol. Sale price → price, list price → list_price when both present. Per-site extraction_confidence shows how many of the 5 anchor fields parsed; below 0.5 you should treat the row as unreliable.
Q: Does it support webhook alerts?
A: Yes in v0.2 — set alert_webhook_url and we POST when significant price changes fire. $0.005 per dispatched alert. URL query strings are redacted before being persisted in the snapshot record so any tokens in the URL never end up in your dataset.
Q: What's the difference between v0.1 and v0.2?
A: v0.2 adds: (1) webhooks, (2) vs_reference compareTo, (3) JSON-LD universal fallback for Target/Best Buy/Etsy/Shopify/any Schema.org storefront, (4) dry_run preview mode, (5) Apify Residential proxy opt-in, (6) Amazon locale + 2-retry backoff, (7) variants + image gallery, (8) keyword_research un-gated. See CHANGELOG.md.
Q: Can I try it before I pay?
A: Yes — first 50 results are free. Or set dry_run: true to build full snapshots without any charge.
Q: Is this legal?
A: This actor reads HTML that any anonymous web visitor can see; it honours robots.txt and uses an identifying User-Agent. You are responsible for ensuring your use complies with each marketplace's Terms of Service and applicable law. Don't redistribute the marketplaces' product images, descriptions, or reviews beyond what your usage rights allow. See TERMS.md for the full Terms of Use.
Limitations
- JavaScript-only pages (rare on Amazon/eBay/Walmart product pages; sometimes seen on niche storefronts that don't emit Schema.org JSON-LD) → not supported in v0.2.
- Amazon's anti-bot serves CAPTCHAs to a portion of anonymous requests → mitigated by
risk_flags+ non-chargeablecandidates_only, but the failure rate is non-zero. Enableuse_apify_proxy: trueto dramatically reduce this. - 25 product groups max per run (keeps the bill predictable; for larger universes, run multiple actor calls).
- Full review-text extraction (per-review author/body/helpful votes) is not in v0.2 — only themes + rating distribution. Planned for v0.4.
Changelog
- 0.2.0 (2026-05-22) — Webhook alerts, JSON-LD universal fallback,
vs_reference,dry_run, Apify Residential proxy, Amazon retry/locale, variants + images,keyword_researchun-gated. 165 tests. - 0.1.0 (2026-05-22) — Initial release with cross-marketplace bundle, change detection, opportunity insights. 94 tests.
Related work
Looking for something more specific? These other actors in the same broader space may be a better fit:
- Single-marketplace, max-volume scraping at $0.005/row — search "Amazon product scraper" / "Walmart product scraper" on the Apify Store.
- Full review-text extraction with author + helpful votes — search "Amazon reviews scraper".
- Shopify / WooCommerce / Etsy / niche storefront scraping — search "Shopify scraper".
- Generic JSON-LD scraping of any storefront — search "ecommerce JSON-LD scraper".
PricePulse AI complements these — it's the opinionated aggregation + intelligence layer on top of single-source row dumps.
For developers
Architecture
.actor/ # Apify metadata + input schemasrc/ # Production codemarketplaces/ # One module per marketplace (amazon, ebay, walmart)models.py # Input + output Pydantic models — the specutils.py # HTTP, parsing, rate limit helpersrobots.py # robots.txt cachenormalizer.py # cross-marketplace aggregates + profile mergereviews.py # rule-based summary + optional LLM enrichmentinsights.py # rule-based opportunity insightschanges.py # baseline diff computationllm.py # Anthropic enrichment helperassembler.py # canonical FinalOutput buildermain.py # Actor entrypoint with top-level exception handler
Confidence math
overall = 0.3·discovery + 0.4·extraction + 0.1·reviews + 0.2·insights
discovery= resolved_marketplaces / requested_marketplacesextraction= mean of per-site extraction_confidence (anchors_parsed / 5)reviews= 0 / 0.7 (rule-based) / 0.9 (Anthropic)insights= 0 / 0.5 (1 site) / 0.85 (≥2 sites)
Run locally
pip install -r requirements-dev.txtAPIFY_LOCAL_STORAGE_DIR=./storage python main.py < sample_input.json
Tests
$pytest -q # 94 tests, ~0.3s
Extending — add a marketplace
Implement src/marketplaces/<name>.py with the same fetch_<name>(...) -> MarketplaceResult signature, register it in FETCHERS (in src/extraction.py), add a tests/test_<name>.py. The assembler picks up new marketplaces automatically.
Terms of Use (summary)
By using this actor you agree to (a) verify all output against the cited source URLs, (b) comply with the Terms of Service of each marketplace you query, and (c) accept that output is provided AS IS, with the Operator's aggregate liability capped at the greater of 12 months of fees paid or USD 100. Marketplace and brand names are used nominatively; no affiliation is implied. The actor does not retain your inputs or outputs outside the run. See TERMS.md for the complete text.