E-Commerce Product Scraper
Pricing
Pay per usage
E-Commerce Product Scraper
Extract structured product data (title, price, currency, availability, images, specs) from any e-commerce website. Supports 50+ stores. HTTP-first with automatic Playwright fallback for JS-heavy sites.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Max Gor
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
3 days ago
Last modified
Categories
Share
Extract structured product data from any e-commerce website — title, price, currency, availability, images, specs, and more.
Works with 50+ online stores out of the box. Uses smart HTTP-first fetching with automatic Playwright fallback for JavaScript-heavy sites.
Features
- Universal extraction — works with any e-commerce site, not just specific stores
- 4-layer parsing — JSON-LD → Open Graph → Microdata → CSS heuristics for maximum coverage
- Smart rendering — tries fast HTTP first; switches to headless browser only when needed
- Structured output — clean JSON with title, price, currency, stock status, images, brand, SKU, specs
- Multi-currency — auto-detects UAH, USD, EUR, GBP, PLN, CZK, RON
- Breadcrumbs — extracts product category path when available
- Proxy support — works with Apify proxy for anti-bot bypass
Supported Stores (tested)
| Region | Stores |
|---|---|
| 🇺🇦 Ukraine | Rozetka, Foxtrot, Epicentr, Comfy, Allo, Citrus, Moyo, Prom.ua |
| 🇪🇺 Europe | Amazon.de, MediaMarkt, Notino, Zara, H&M, IKEA |
| 🌍 Global | Amazon.com, eBay, AliExpress*, Best Buy, Walmart |
*AliExpress requires Playwright mode (set forcePlaywright: true)
The scraper also works with any other e-commerce site that uses standard product markup (JSON-LD, Open Graph, or Microdata) — which is the vast majority of online stores.
Input
{"urls": ["https://rozetka.com.ua/ua/some-product/p123456/","https://www.amazon.com/dp/B0EXAMPLE/"],"forcePlaywright": false,"maxConcurrency": 5}
| Field | Type | Description |
|---|---|---|
urls | string[] | Required. Product page URLs to scrape |
forcePlaywright | boolean | Force headless browser for all URLs (default: false) |
maxConcurrency | integer | Max parallel pages (default: 5, max: 20) |
proxyConfiguration | object | Proxy settings (Apify proxy recommended for protected sites) |
Output
Each product is saved to the dataset as a JSON object:
{"url": "https://rozetka.com.ua/ua/samsung-galaxy-s24/p395058825/","store": "rozetka.com.ua","title": "Samsung Galaxy S24 Ultra 12/256GB Titanium Black","price": 51999.0,"currency": "UAH","in_stock": true,"image": "https://content.rozetka.com.ua/...","brand": "Samsung","sku": "SM-S928BZKDSEK","description": "Смартфон Samsung Galaxy S24 Ultra...","rating": 4.8,"review_count": 342,"breadcrumbs": ["Смартфони", "Samsung", "Galaxy S24"],"extraction_method": "json-ld"}
Output fields
| Field | Type | Description |
|---|---|---|
url | string | Original URL |
store | string | Store domain |
title | string | Product name |
price | float | Price as a number |
currency | string | ISO currency code (UAH, USD, EUR, etc.) |
in_stock | boolean | Availability status |
image | string | Main product image URL |
brand | string | Brand name |
sku | string | Product SKU or MPN |
description | string | Short description (max 500 chars) |
rating | float | Average rating (if available) |
review_count | integer | Number of reviews (if available) |
breadcrumbs | string[] | Category path |
specs | object | Technical specifications (if available) |
extraction_method | string | Which extraction layer succeeded |
How it works
The scraper uses a 4-layer extraction strategy, running each layer in order and filling in missing data:
- JSON-LD (highest confidence) — parses
<script type="application/ld+json">with@type: Product - Open Graph — reads
<meta property="og:*">and<meta property="product:*">tags - Microdata — finds
itemscope itemtype="schema.org/Product"attributes - CSS Heuristics — falls back to common CSS selector patterns for price, title, etc.
If HTTP fetch returns weak data (no title or no price), the scraper automatically retries with a headless Chromium browser (Playwright) to handle JavaScript-rendered pages.
Use Cases
- Price monitoring — track competitor prices across multiple stores
- Market research — collect pricing data for analysis
- Product catalog — build product databases from multiple sources
- Dropshipping — check prices and availability across suppliers
- Price comparison — aggregate offers for the same product
Tips
- For best results with protected sites (Cloudflare, AWS WAF), enable Apify Proxy
- Set
forcePlaywright: truefor sites known to require JavaScript (AliExpress, some fashion stores) - Keep
maxConcurrencyat 3-5 for sites with aggressive rate limiting - The scraper respects
robots.txt— use responsibly
Cost estimate
| Mode | Compute units per URL | Cost* |
|---|---|---|
| HTTP only | ~0.005 | ~$0.0005 |
| Playwright | ~0.05-0.1 | ~$0.005-0.01 |
| Mixed (auto) | ~0.01-0.03 avg | ~$0.001-0.003 |
*Based on Apify platform pricing. Actual costs depend on page complexity and proxy usage.