E-Commerce Product Scraper avatar

E-Commerce Product Scraper

Pricing

Pay per usage

Go to Apify Store
E-Commerce Product Scraper

E-Commerce Product Scraper

Extract structured product data (title, price, currency, availability, images, specs) from any e-commerce website. Supports 50+ stores. HTTP-first with automatic Playwright fallback for JS-heavy sites.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Max Gor

Max Gor

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

3 days ago

Last modified

Categories

Share

Extract structured product data from any e-commerce website — title, price, currency, availability, images, specs, and more.

Works with 50+ online stores out of the box. Uses smart HTTP-first fetching with automatic Playwright fallback for JavaScript-heavy sites.

Features

  • Universal extraction — works with any e-commerce site, not just specific stores
  • 4-layer parsing — JSON-LD → Open Graph → Microdata → CSS heuristics for maximum coverage
  • Smart rendering — tries fast HTTP first; switches to headless browser only when needed
  • Structured output — clean JSON with title, price, currency, stock status, images, brand, SKU, specs
  • Multi-currency — auto-detects UAH, USD, EUR, GBP, PLN, CZK, RON
  • Breadcrumbs — extracts product category path when available
  • Proxy support — works with Apify proxy for anti-bot bypass

Supported Stores (tested)

RegionStores
🇺🇦 UkraineRozetka, Foxtrot, Epicentr, Comfy, Allo, Citrus, Moyo, Prom.ua
🇪🇺 EuropeAmazon.de, MediaMarkt, Notino, Zara, H&M, IKEA
🌍 GlobalAmazon.com, eBay, AliExpress*, Best Buy, Walmart

*AliExpress requires Playwright mode (set forcePlaywright: true)

The scraper also works with any other e-commerce site that uses standard product markup (JSON-LD, Open Graph, or Microdata) — which is the vast majority of online stores.

Input

{
"urls": [
"https://rozetka.com.ua/ua/some-product/p123456/",
"https://www.amazon.com/dp/B0EXAMPLE/"
],
"forcePlaywright": false,
"maxConcurrency": 5
}
FieldTypeDescription
urlsstring[]Required. Product page URLs to scrape
forcePlaywrightbooleanForce headless browser for all URLs (default: false)
maxConcurrencyintegerMax parallel pages (default: 5, max: 20)
proxyConfigurationobjectProxy settings (Apify proxy recommended for protected sites)

Output

Each product is saved to the dataset as a JSON object:

{
"url": "https://rozetka.com.ua/ua/samsung-galaxy-s24/p395058825/",
"store": "rozetka.com.ua",
"title": "Samsung Galaxy S24 Ultra 12/256GB Titanium Black",
"price": 51999.0,
"currency": "UAH",
"in_stock": true,
"image": "https://content.rozetka.com.ua/...",
"brand": "Samsung",
"sku": "SM-S928BZKDSEK",
"description": "Смартфон Samsung Galaxy S24 Ultra...",
"rating": 4.8,
"review_count": 342,
"breadcrumbs": ["Смартфони", "Samsung", "Galaxy S24"],
"extraction_method": "json-ld"
}

Output fields

FieldTypeDescription
urlstringOriginal URL
storestringStore domain
titlestringProduct name
pricefloatPrice as a number
currencystringISO currency code (UAH, USD, EUR, etc.)
in_stockbooleanAvailability status
imagestringMain product image URL
brandstringBrand name
skustringProduct SKU or MPN
descriptionstringShort description (max 500 chars)
ratingfloatAverage rating (if available)
review_countintegerNumber of reviews (if available)
breadcrumbsstring[]Category path
specsobjectTechnical specifications (if available)
extraction_methodstringWhich extraction layer succeeded

How it works

The scraper uses a 4-layer extraction strategy, running each layer in order and filling in missing data:

  1. JSON-LD (highest confidence) — parses <script type="application/ld+json"> with @type: Product
  2. Open Graph — reads <meta property="og:*"> and <meta property="product:*"> tags
  3. Microdata — finds itemscope itemtype="schema.org/Product" attributes
  4. CSS Heuristics — falls back to common CSS selector patterns for price, title, etc.

If HTTP fetch returns weak data (no title or no price), the scraper automatically retries with a headless Chromium browser (Playwright) to handle JavaScript-rendered pages.

Use Cases

  • Price monitoring — track competitor prices across multiple stores
  • Market research — collect pricing data for analysis
  • Product catalog — build product databases from multiple sources
  • Dropshipping — check prices and availability across suppliers
  • Price comparison — aggregate offers for the same product

Tips

  • For best results with protected sites (Cloudflare, AWS WAF), enable Apify Proxy
  • Set forcePlaywright: true for sites known to require JavaScript (AliExpress, some fashion stores)
  • Keep maxConcurrency at 3-5 for sites with aggressive rate limiting
  • The scraper respects robots.txt — use responsibly

Cost estimate

ModeCompute units per URLCost*
HTTP only~0.005~$0.0005
Playwright~0.05-0.1~$0.005-0.01
Mixed (auto)~0.01-0.03 avg~$0.001-0.003

*Based on Apify platform pricing. Actual costs depend on page complexity and proxy usage.