# E-Commerce Product Scraper (`skipper_lume/ecommerce-product-scraper`) Actor

Extract structured product data (title, price, currency, availability, images, specs) from any e-commerce website. Supports 50+ stores. HTTP-first with automatic Playwright fallback for JS-heavy sites.

- **URL**: https://apify.com/skipper\_lume/ecommerce-product-scraper.md
- **Developed by:** [Max Gor](https://apify.com/skipper_lume) (community)
- **Categories:** E-commerce
- **Stats:** 1 total users, 0 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## E-Commerce Product Scraper

Extract structured product data from **any e-commerce website** — title, price, currency, availability, images, specs, and more.

Works with 50+ online stores out of the box. Uses smart **HTTP-first** fetching with automatic **Playwright fallback** for JavaScript-heavy sites.

### Features

- **Universal extraction** — works with any e-commerce site, not just specific stores
- **4-layer parsing** — JSON-LD → Open Graph → Microdata → CSS heuristics for maximum coverage
- **Smart rendering** — tries fast HTTP first; switches to headless browser only when needed
- **Structured output** — clean JSON with title, price, currency, stock status, images, brand, SKU, specs
- **Multi-currency** — auto-detects UAH, USD, EUR, GBP, PLN, CZK, RON
- **Breadcrumbs** — extracts product category path when available
- **Proxy support** — works with Apify proxy for anti-bot bypass

### Supported Stores (tested)

| Region | Stores |
|--------|--------|
| 🇺🇦 Ukraine | Rozetka, Foxtrot, Epicentr, Comfy, Allo, Citrus, Moyo, Prom.ua |
| 🇪🇺 Europe | Amazon.de, MediaMarkt, Notino, Zara, H&M, IKEA |
| 🌍 Global | Amazon.com, eBay, AliExpress*, Best Buy, Walmart |

*\*AliExpress requires Playwright mode (set `forcePlaywright: true`)*

The scraper also works with **any other e-commerce site** that uses standard product markup (JSON-LD, Open Graph, or Microdata) — which is the vast majority of online stores.

### Input

```json
{
    "urls": [
        "https://rozetka.com.ua/ua/some-product/p123456/",
        "https://www.amazon.com/dp/B0EXAMPLE/"
    ],
    "forcePlaywright": false,
    "maxConcurrency": 5
}
````

| Field | Type | Description |
|-------|------|-------------|
| `urls` | string\[] | **Required.** Product page URLs to scrape |
| `forcePlaywright` | boolean | Force headless browser for all URLs (default: `false`) |
| `maxConcurrency` | integer | Max parallel pages (default: `5`, max: `20`) |
| `proxyConfiguration` | object | Proxy settings (Apify proxy recommended for protected sites) |

### Output

Each product is saved to the dataset as a JSON object:

```json
{
    "url": "https://rozetka.com.ua/ua/samsung-galaxy-s24/p395058825/",
    "store": "rozetka.com.ua",
    "title": "Samsung Galaxy S24 Ultra 12/256GB Titanium Black",
    "price": 51999.0,
    "currency": "UAH",
    "in_stock": true,
    "image": "https://content.rozetka.com.ua/...",
    "brand": "Samsung",
    "sku": "SM-S928BZKDSEK",
    "description": "Смартфон Samsung Galaxy S24 Ultra...",
    "rating": 4.8,
    "review_count": 342,
    "breadcrumbs": ["Смартфони", "Samsung", "Galaxy S24"],
    "extraction_method": "json-ld"
}
```

#### Output fields

| Field | Type | Description |
|-------|------|-------------|
| `url` | string | Original URL |
| `store` | string | Store domain |
| `title` | string | Product name |
| `price` | float | Price as a number |
| `currency` | string | ISO currency code (UAH, USD, EUR, etc.) |
| `in_stock` | boolean | Availability status |
| `image` | string | Main product image URL |
| `brand` | string | Brand name |
| `sku` | string | Product SKU or MPN |
| `description` | string | Short description (max 500 chars) |
| `rating` | float | Average rating (if available) |
| `review_count` | integer | Number of reviews (if available) |
| `breadcrumbs` | string\[] | Category path |
| `specs` | object | Technical specifications (if available) |
| `extraction_method` | string | Which extraction layer succeeded |

### How it works

The scraper uses a 4-layer extraction strategy, running each layer in order and filling in missing data:

1. **JSON-LD** (highest confidence) — parses `<script type="application/ld+json">` with `@type: Product`
2. **Open Graph** — reads `<meta property="og:*">` and `<meta property="product:*">` tags
3. **Microdata** — finds `itemscope itemtype="schema.org/Product"` attributes
4. **CSS Heuristics** — falls back to common CSS selector patterns for price, title, etc.

If HTTP fetch returns weak data (no title or no price), the scraper automatically retries with a headless Chromium browser (Playwright) to handle JavaScript-rendered pages.

### Use Cases

- **Price monitoring** — track competitor prices across multiple stores
- **Market research** — collect pricing data for analysis
- **Product catalog** — build product databases from multiple sources
- **Dropshipping** — check prices and availability across suppliers
- **Price comparison** — aggregate offers for the same product

### Tips

- For best results with **protected sites** (Cloudflare, AWS WAF), enable Apify Proxy
- Set `forcePlaywright: true` for sites known to require JavaScript (AliExpress, some fashion stores)
- Keep `maxConcurrency` at 3-5 for sites with aggressive rate limiting
- The scraper respects `robots.txt` — use responsibly

### Cost estimate

| Mode | Compute units per URL | Cost\* |
|------|----------------------|-------|
| HTTP only | ~0.005 | ~$0.0005 |
| Playwright | ~0.05-0.1 | ~$0.005-0.01 |
| Mixed (auto) | ~0.01-0.03 avg | ~$0.001-0.003 |

*\*Based on Apify platform pricing. Actual costs depend on page complexity and proxy usage.*

# Actor input Schema

## `urls` (type: `array`):

List of product page URLs to scrape. Each URL should point to a single product page.

## `forcePlaywright` (type: `boolean`):

Force headless browser for all URLs. By default, HTTP is tried first and Playwright is used only as fallback. Enable for JS-heavy sites.

## `maxConcurrency` (type: `integer`):

Maximum number of pages scraped in parallel.

## `proxyConfiguration` (type: `object`):

Proxy settings for bypassing anti-bot protection. Auto-enabled when running on Apify platform if not set.

## Actor input object example

```json
{
  "urls": [
    "https://rozetka.com.ua/ua/samsung-galaxy-s24-ultra-12-256gb-titanium-black-sm-s928bzkdsek/p395058825/"
  ],
  "forcePlaywright": false,
  "maxConcurrency": 5,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# Actor output Schema

## `products` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        "https://rozetka.com.ua/ua/samsung-galaxy-s24-ultra-12-256gb-titanium-black-sm-s928bzkdsek/p395058825/"
    ],
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("skipper_lume/ecommerce-product-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "urls": ["https://rozetka.com.ua/ua/samsung-galaxy-s24-ultra-12-256gb-titanium-black-sm-s928bzkdsek/p395058825/"],
    "proxyConfiguration": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("skipper_lume/ecommerce-product-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    "https://rozetka.com.ua/ua/samsung-galaxy-s24-ultra-12-256gb-titanium-black-sm-s928bzkdsek/p395058825/"
  ],
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}' |
apify call skipper_lume/ecommerce-product-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=skipper_lume/ecommerce-product-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "E-Commerce Product Scraper",
        "description": "Extract structured product data (title, price, currency, availability, images, specs) from any e-commerce website. Supports 50+ stores. HTTP-first with automatic Playwright fallback for JS-heavy sites.",
        "version": "1.0",
        "x-build-id": "0oTa8lK8YoYkPT7nU"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/skipper_lume~ecommerce-product-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-skipper_lume-ecommerce-product-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/skipper_lume~ecommerce-product-scraper/runs": {
            "post": {
                "operationId": "runs-sync-skipper_lume-ecommerce-product-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/skipper_lume~ecommerce-product-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-skipper_lume-ecommerce-product-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "urls"
                ],
                "properties": {
                    "urls": {
                        "title": "Product URLs",
                        "type": "array",
                        "description": "List of product page URLs to scrape. Each URL should point to a single product page.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "forcePlaywright": {
                        "title": "Force Playwright (browser)",
                        "type": "boolean",
                        "description": "Force headless browser for all URLs. By default, HTTP is tried first and Playwright is used only as fallback. Enable for JS-heavy sites.",
                        "default": false
                    },
                    "maxConcurrency": {
                        "title": "Max concurrency",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "Maximum number of pages scraped in parallel.",
                        "default": 5
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Proxy settings for bypassing anti-bot protection. Auto-enabled when running on Apify platform if not set."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
