MarketVantage: Global E-commerce & Variant Scraper avatar

MarketVantage: Global E-commerce & Variant Scraper

Pricing

$29.99/month + usage

Go to Apify Store
MarketVantage: Global E-commerce & Variant Scraper

MarketVantage: Global E-commerce & Variant Scraper

Deep Variant Scraper for marketplaces like Temu, AliExpress, and Shein. Standard tools fail on dynamic pricing and variant mapping (size/color SKU data). This agent maps every combination, bypasses anti-bot blocks, and delivers structured, AI-ready product data for price monitoring & research.

Pricing

$29.99/month + usage

Rating

0.0

(0)

Developer

Filip Ebert, BSc.

Filip Ebert, BSc.

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

6 days ago

Last modified

Share

Global Marketplace & Variant Specialist

Extract every product variant — size, color, material — from the world's most anti-bot-protected marketplaces.

Built for dropshippers, competitor analysts, and feed managers who need clean, structured SKU data from fast-fashion and global marketplace sites — not raw HTML to parse yourself.


Supported sites

SiteExtraction methodAnti-bot level
AliExpresswindow.runParams (full SKU matrix)★★★★
Temu__NEXT_DATA__ hydration JSON★★★★★
SHEINwindow.gbProductDetailInfo / __NEXT_DATA__★★★★
EtsyJSON-LD + variation JSON blocks★★★
DHGatewindow.__initial_state__★★★
Wishwindow.__data__ / __NEXT_DATA__★★★★
Joomwindow.__JOOM_DATA__★★★
Banggoodwindow.productData★★★
Gearbestwindow.gbData★★★
Any other siteJSON-LD + Open Graph meta (universal fallback)

Try it now

Paste this into the Apify Actor input form and click Run:

{
"sources": [
"https://www.aliexpress.com/item/1005006500000001.html",
"https://www.etsy.com/listing/123456789/handmade-ceramic-mug",
"https://www.temu.com/goods.html?goods_id=601099512345678"
],
"targetCountry": "US",
"targetCurrency": "USD",
"maxConcurrency": 3
}

What makes this different

Variant-First extraction

Most scrapers only grab the displayed price. This actor reads the full SKU matrix embedded in each site's hydration JSON before JavaScript renders anything — so if a blue shirt has 5 sizes, you get 5 SKU entries, each with its own price, stock count, and variant-specific image.

"variants": [
{ "sku_id": "1,10", "color": "Blue", "size": "S", "sku_price": 12.99, "stock_status": "in_stock", "stock_quantity": 47, "variant_image": "https://..." },
{ "sku_id": "1,11", "color": "Blue", "size": "M", "sku_price": 12.99, "stock_status": "in_stock", "stock_quantity": 23, "variant_image": "https://..." },
{ "sku_id": "1,12", "color": "Blue", "size": "L", "sku_price": 13.99, "stock_status": "out_of_stock", "stock_quantity": 0, "variant_image": "https://..." }
]

Shadow Pricing captured

Temu and AliExpress show different prices based on selected variant. Because this actor reads the full price list from the hydration JSON — not the rendered DOM — every variant's exact price is captured, including discounts that only appear after selection.

Stealth browser with fingerprint injection

Anti-bot systems like DataDome and Cloudflare Bot Management are defeated through:

  • Fingerprint injection — randomised navigator.webdriver, canvas hash, audio context, plugins list per request
  • CDP timezone spoofing — browser timezone matches targetCountry
  • Human-like behaviour — random mouse movement and delays before extraction
  • Geo-targeted headersAccept-Language and locale match the target country

AI-ready output

Every product includes a clean_text field: the description with all HTML tags stripped and entities decoded. Feed it directly into GPT-4 or Claude for competitor analysis, SEO copy generation, or product categorisation.

Geo-targeting & Currency

Set targetCountry: "DE" and the browser will appear to be a German user (language headers, timezone). Set targetCurrency: "EUR" and all prices are converted. Add a residential proxy from Germany for true local pricing.


Output format

One JSON object per product URL:

{
"productId": "1005006500000001",
"title": "2024 New Summer Women Dress Floral Print",
"description": "<p>Material: 95% Cotton...</p>",
"clean_text": "Material: 95% Cotton 5% Spandex. Available in 5 colors and 4 sizes.",
"price": 14.99,
"original_price": 29.99,
"currency": "USD",
"availability": true,
"quantity": 312,
"sku_count": 20,
"variants": [
{
"sku_id": "Color:Red|Size:S",
"attributes": { "Color": "Red", "Size": "S" },
"color": "Red",
"size": "S",
"sku_price": 14.99,
"original_price": 29.99,
"stock_status": "in_stock",
"stock_quantity": 48,
"variant_image": "https://ae01.alicdn.com/kf/red-s.jpg"
}
],
"images": [
"https://ae01.alicdn.com/kf/main.jpg",
"https://ae01.alicdn.com/kf/detail1.jpg"
],
"reviews_count": 1847,
"rating": 4.8,
"seller_id": "123456789",
"seller_name": "Fashion Boutique Store",
"shipping_cost": 0,
"source_url": "https://www.aliexpress.com/item/1005006500000001.html",
"site_label": "aliexpress",
"country": "US",
"scraped_at": "2026-03-04T12:00:00.000Z"
}

Input parameters

ParameterTypeRequiredDescription
sourcesstring[]Product page URLs to scrape
targetCountrystringISO 3166-1 alpha-2 (e.g. US, DE, GB). Default: US
targetCurrencystringOutput currency (e.g. USD, EUR). Default: USD
proxyUrlstringhttp://user:pass@host:port — use residential proxy for geo-accurate pricing
liveRatesUrlstringExchange rate API URL (e.g. open.er-api.com). Hardcoded rates used if omitted
maxConcurrencyintegerParallel requests 1–20. Use 1–3 for Temu/AliExpress. Default: 3
priceSelectorstringCSS selector override for price (unsupported sites)
titleSelectorstringCSS selector override for title (unsupported sites)

Business intelligence use cases

Dropshipping & product sourcing

Find which Temu/AliExpress products have inventory on all sizes. Filter variants where stock_status === "out_of_stock" to avoid listing dead SKUs.

Competitor price tracking

Run daily on competitor Etsy listings. Compare sku_price per variant combination to spot when they run size-specific discounts.

LLM-powered product feed enrichment

Feed clean_text into GPT-4 to auto-generate localised product descriptions, SEO meta titles, or translate listings. No preprocessing needed.

Buy-Box analysis (Etsy / DHGate)

Multiple sellers list identical products. seller_id and shipping_cost let you identify the cheapest total-cost option and track which seller holds the Buy Box.

Pricing parity monitoring

Same product, different countries. Run with targetCountry: "US" and targetCountry: "DE" (plus matching residential proxies) to detect price discrimination across markets.


Adding a new site

The architecture is modular. To add support for a new store:

  1. Create src/sites/mystore.js following this interface:
const MATCH_RE = /mystore\.com/i;
function matches(url) { return MATCH_RE.test(url); }
const LABEL = 'mystore';
function extractFromWindowVars(html) {
// 1. Try to find window.__STORE_DATA__ or similar
// 2. Return canonical product object, or null to fall back
return null;
}
const CSS_SELECTORS = {
title : 'h1.product-title',
price : '.price-current',
description: '#product-description',
images : '.product-gallery img',
};
module.exports = { matches, label: LABEL, extractFromWindowVars, CSS_SELECTORS };
  1. Register it in src/sites/index.js — insert it before the generic entry.

That's it. The orchestration, variant normalisation, stealth browser, and currency conversion all work automatically.


Architecture overview

src/
├── main.js Apify Actor entry point — concurrency + currency
├── fetcher.js Stealth Playwright browser + HTTP fallback
├── parser.js Multi-strategy extraction pipeline
├── variant_extractor.js Cartesian product SKU builder
├── stealth_browser.js Fingerprint injection, anti-challenge helpers
├── geo_config.js Country → locale/timezone/currency profiles
├── normalize.js Price parsing + currency conversion
├── utils.js cleanText(), extractASIN(), safeJsonParse()
└── sites/
├── index.js Site registry + URL resolver
├── aliexpress.js window.runParams extractor
├── temu.js __NEXT_DATA__ extractor
├── shein.js gbProductDetailInfo extractor
├── etsy.js JSON-LD + variation blocks extractor
├── dhgate.js __initial_state__ extractor
├── wish.js __data__ extractor
├── joom.js __JOOM_DATA__ extractor
├── banggood.js productData extractor
├── gearbest.js gbData extractor
└── generic.js JSON-LD + Open Graph fallback

This actor reads only publicly visible product information.

  • Do not use authenticated sessions or bypass paywalls
  • You are responsible for compliance with each website's Terms of Service
  • Use residential proxies only through legitimate proxy providers
  • Respect robots.txt and crawl rate limits (maxConcurrency: 1–3 for protected sites)

Q: How often should I run it?
A: Most teams run daily or weekly. Real-time monitoring isn't needed for DRAM pricing.

Q: Does it work with market indexes like DRAMeXchange?
A: Not yet. Market/index sites use different data formats. Planned for v1.1.

Q: Can I export the data?
A: Yes. All results are stored in JSON format and can be exported to CSV, Google Sheets, or via API.


Get started

  1. Click Try for free in the Apify Console
  2. Paste the demo input above
  3. Click Run
  4. See price data from real Mouser and Digi-Key products

Need help? Check the examples/ for more input samples.