# Lean Shopify Scraper (`worktech/lean-shopify-scraper`) Actor

Modular Shopify scraper — pay only for the fields you need. Price / Catalog / Full modes with transparent SKU merging and visible error handling.

- **URL**: https://apify.com/worktech/lean-shopify-scraper.md
- **Developed by:** [Per Schondell](https://apify.com/worktech) (community)
- **Categories:** E-commerce
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Lean Shopify Scraper

> The Shopify scraper that bills you for **what you actually need**.

If you only need daily price tracking, you shouldn't be paying for review scraping every run. This actor splits Shopify storefront extraction into three explicit modes with separate billing — pick the tier that matches your job.

### Why use this actor?

Most Shopify actors bundle everything together: product data, variants, images, review aggregation, sales estimates. A dropshipper running a daily price-watch on 1,000 products pays for review scraping they don't need — every run, forever.

**Lean Shopify Scraper has three modes:**

| Mode | Price | Returns | Best for |
|------|------|---------|----------|
| **Price**   | **$1.50 / 1K products** | current price, compare-at price, availability, sold-out variant ratio | Daily price-watch, dynamic repricing, discount detection |
| **Catalog** | **$4 / 1K products** | + variants, images, tags, description, vendor, SKU, barcode | Product research, dropshipping, catalog import |
| **Full**    | **$8 / 1K products** | + review-app aggregation (Judge.me / Yotpo / Loox / Okendo / Stamped), sales estimate | Competitor intelligence, full audits |

> v0.1 ships with **Price** and **Catalog** modes. Full mode is on the roadmap below.

### How it stands apart

- **100% parse success rate** on every reachable Shopify store in our 20-store benchmark (see `test-results/real-stores-report.json`). Run `npm run test:real` to reproduce.
- **Correct SKU merging** — consolidated product pages return one row per product, not one per SKU. (Some popular alternatives don't do this.)
- **No silent failures** — HTTP 430 security rejections, 429 rate limits, malformed JSON, and blocked endpoints all surface as named errors in the run log with full per-store context. Crashed runs tell you exactly which store failed and why.
- **Built-in retry-with-backoff** — 429 and 5xx errors retry automatically with exponential backoff, honouring `Retry-After`. 430 / 4xx errors do not retry (they won't fix themselves).
- **Cents-based price math** — no float drift on `19.99 + 0.01`. Output uses integer minor units; divide by 100 in your downstream code.

### Quick start

#### Minimal input

```json
{
  "mode": "price",
  "storeUrls": ["https://allbirds.com"]
}
````

#### Full input schema

```json
{
  "mode": "price",
  "storeUrls": [
    "https://allbirds.com",
    "https://gymshark.com"
  ],
  "maxProductsPerStore": 500,
  "delayMs": 1500
}
```

#### Sample output (Price mode)

```json
{
  "storeUrl": "https://allbirds.com",
  "productId": 6616124981328,
  "handle": "trino-cozy-crew-heathered-onyx",
  "title": "Trino® Cozy Crew - Heathered Onyx",
  "vendor": "Allbirds",
  "productType": "Socks",
  "minPriceCents": 2400,
  "maxPriceCents": 2400,
  "compareAtMinCents": null,
  "compareAtMaxCents": null,
  "hasDiscount": false,
  "variantCount": 4,
  "soldOutCount": 3,
  "soldOutRatio": 0.75,
  "scrapedAt": "2026-05-27T15:53:23.620Z"
}
```

A `soldOutRatio` near 1.0 with the product still listed is one of the strongest demand signals in Shopify catalog data — buyers want this product enough to clear most variants, and the store is slow to restock.

### Pricing — concrete examples

| Scenario | Mode | Run cost |
|----------|------|----------|
| Daily price check on 1,000 products | Price | $1.50 / day → ~$45/mo |
| Full catalog ingest of 5,000 products (one-time) | Catalog | $20 |
| Competitor audit with reviews + sales estimate on 500 products | Full | $4 |

Compare with bundled-billing actors that charge $4–$7 per 1K regardless of fields requested.

### Comparison vs current alternatives (Apify Store, May 2026)

| Feature                          | Lean Shopify Scraper | autofacts/shopify | webdatalabs/shopify-product-scraper |
|----------------------------------|:---:|:---:|:---:|
| Modular billing (Price / Catalog / Full) | ✅ | ❌ | ❌ |
| Proper SKU merging on product pages | ✅ | ❌ (documented limitation) | ✅ |
| Explicit 430 / 429 / parse error logging | ✅ | partial | partial |
| Retry-with-backoff + Retry-After honored | ✅ | partial | partial |
| Cents-based price math (no float drift) | ✅ | ✅ | ❓ |
| Review-app aggregation (Judge.me / Yotpo / Loox / Okendo) | Full mode (v0.2) | ❌ | ✅ |
| Published parse success rate | **100% on 20/20 reachable stores** | not published | not published |

### Roadmap

- **v0.2** — Sitemap-based pagination for catalogs >5K products; Web Bot Auth signed requests (significantly fewer 429s)
- **v0.3** — Full mode (review-app aggregation + algorithmic sales estimate)
- **v0.4** — Diff / webhook mode for price-change alerts on a watched set
- **v0.5** — DACH-specific support (CHF / EUR rounding, VAT-inclusive prices, German review platforms)

### Limitations

- Currently scrapes the public `/products.json` endpoint only. Stores with >5,000 products will lose tail items until sitemap pagination ships in v0.2.
- "Sales estimate" in Full mode is algorithmic (based on review volume + recency). It is not actual transaction data — treat as directional.
- Stores that have moved off Shopify return a `NOT_SHOPIFY` error (HTML body instead of JSON). The actor classifies this clearly rather than silently returning empty.

### Local development

```bash
npm install
npm test          ## unit tests (vitest)
npm run test:real ## integration test against ~33 real Shopify stores
npm run build     ## compile TypeScript
npm run start:dev ## run locally with storage/key_value_stores/default/INPUT.json
```

### Status

**v0.1** — Price and Catalog modes in production. 38 unit tests + 33-store integration suite passing.

# Actor input Schema

## `mode` (type: `string`):

Price = current/compare-at price + sold-out ratio. Catalog = full product data. Full = adds review-app aggregation.

## `storeUrls` (type: `array`):

One or more Shopify storefront URLs (e.g. https://allbirds.com).

## `maxProductsPerStore` (type: `integer`):

Cap on products per store. 0 = unlimited.

## `delayMs` (type: `integer`):

Politeness delay between paginated requests to the same store.

## Actor input object example

```json
{
  "mode": "price",
  "storeUrls": [
    "https://allbirds.com"
  ],
  "maxProductsPerStore": 0,
  "delayMs": 1500
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "storeUrls": [
        "https://allbirds.com"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("worktech/lean-shopify-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "storeUrls": ["https://allbirds.com"] }

# Run the Actor and wait for it to finish
run = client.actor("worktech/lean-shopify-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "storeUrls": [
    "https://allbirds.com"
  ]
}' |
apify call worktech/lean-shopify-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=worktech/lean-shopify-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Lean Shopify Scraper",
        "description": "Modular Shopify scraper — pay only for the fields you need. Price / Catalog / Full modes with transparent SKU merging and visible error handling.",
        "version": "0.1",
        "x-build-id": "NFbJpAucxyATIw0hV"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/worktech~lean-shopify-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-worktech-lean-shopify-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/worktech~lean-shopify-scraper/runs": {
            "post": {
                "operationId": "runs-sync-worktech-lean-shopify-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/worktech~lean-shopify-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-worktech-lean-shopify-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "storeUrls"
                ],
                "properties": {
                    "mode": {
                        "title": "Billing mode",
                        "enum": [
                            "price",
                            "catalog",
                            "full"
                        ],
                        "type": "string",
                        "description": "Price = current/compare-at price + sold-out ratio. Catalog = full product data. Full = adds review-app aggregation.",
                        "default": "price"
                    },
                    "storeUrls": {
                        "title": "Shopify store URLs",
                        "type": "array",
                        "description": "One or more Shopify storefront URLs (e.g. https://allbirds.com).",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxProductsPerStore": {
                        "title": "Max products per store",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Cap on products per store. 0 = unlimited.",
                        "default": 0
                    },
                    "delayMs": {
                        "title": "Delay between requests (ms)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Politeness delay between paginated requests to the same store.",
                        "default": 1500
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
