# Shopify Products Scraper (`khadinakbar/shopify-products-scraper`) Actor

Scrape every product from any Shopify store via the public products.json endpoint — prices, variants, images.

- **URL**: https://apify.com/khadinakbar/shopify-products-scraper.md
- **Developed by:** [Khadin Akbar](https://apify.com/khadinakbar) (community)
- **Categories:** E-commerce, MCP servers, Automation
- **Stats:** 1 total users, 0 monthly users, 0.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.00 / 1,000 product scrapeds

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Shopify Products Scraper

Scrape **every product from any Shopify store** — prices, variants, images, options, vendor, tags, and inventory availability — without login, cookies, or an API key. Point it at a store domain, a collection, or individual product URLs and get clean, structured JSON in seconds.

This actor reads each store's **public `products.json` endpoint** (the same data the storefront uses), so it is fast, HTTP‑only, and extremely reliable — no browser, no anti‑bot fight, no breakage when a theme changes.

### What you can do with it

- Export a competitor's **full catalog** with prices and variants for pricing intelligence.
- Pull a single **collection/category** to monitor a product line.
- Enrich a shortlist of **specific product URLs** with full detail.
- **Batch many stores** in one run for market research or dropshipping product discovery.
- Feed an **AI agent** (Claude, ChatGPT, MCP clients) structured product data on demand.

### When to use it (and when not)

| Use this actor for | Use a different actor for |
|---|---|
| Any Shopify storefront's product catalog | Non‑Shopify stores (Amazon, eBay, Walmart, Etsy) |
| Prices, variants, SKUs, images, options | Order, customer, or checkout data (not public) |
| Bulk multi‑store product exports | Shopify App Store listings or store‑lead discovery |

Most stores built on Shopify expose `products.json`. A small number disable it or sit behind a password page — those are reported clearly and skipped, never billed as products.

### Input

| Field | Type | Description |
|---|---|---|
| `storeUrls` | array | Store domains/URLs → scrape the **entire** catalog (e.g. `gymshark.com`). Add many to batch. |
| `collectionUrls` | array | Collection URLs → scrape only that category (e.g. `.../collections/leggings`). |
| `productUrls` | array | Individual product URLs → one detailed record each. |
| `maxProducts` | integer | Total product cap across the run (default `1000`). Billing stops here. |
| `includeVariants` | boolean | Include the per‑variant breakdown (default `true`). |
| `includeImages` | boolean | Include the full image URL list (default `true`). |
| `proxyConfiguration` | object | Apify proxy (default: Apify Proxy auto). |

Provide at least one of `storeUrls`, `collectionUrls`, or `productUrls`.

#### Example input

```json
{
  "storeUrls": ["gymshark.com"],
  "collectionUrls": ["https://www.allbirds.com/collections/mens"],
  "productUrls": ["https://your-store.com/products/product-handle"],
  "maxProducts": 500,
  "includeVariants": true,
  "includeImages": true
}
````

### Output

One row per product. Fields:

| Field | Description |
|---|---|
| `productId` | Shopify product ID |
| `title`, `handle`, `url` | Product name, slug, canonical URL |
| `storeDomain` | Store hostname (no `www`) |
| `vendor`, `productType` | Brand and Shopify product type |
| `description` | Plain‑text description (HTML stripped) |
| `tags` | Product tags array |
| `currency` | Store currency (from `meta.json`) |
| `price`, `priceMax` | Lowest and highest variant price |
| `compareAtPrice` | Lowest compare‑at (sale) price |
| `available` | True if any variant is in stock |
| `variantsCount`, `imagesCount` | Counts |
| `optionNames`, `options` | Option dimensions (Size, Color…) and values |
| `featuredImage`, `images` | Primary image and full image list |
| `variants` | Per‑variant `price`, `sku`, `available`, `options`, weight, etc. |
| `createdAt`, `updatedAt`, `publishedAt`, `scrapedAt` | ISO‑8601 timestamps |

#### Example output (truncated)

```json
{
  "productId": "1234567890",
  "title": "Vital Seamless Leggings",
  "handle": "vital-seamless-leggings",
  "url": "https://gymshark.com/products/vital-seamless-leggings",
  "storeDomain": "gymshark.com",
  "vendor": "Gymshark",
  "productType": "Leggings",
  "currency": "USD",
  "price": 40,
  "priceMax": 50,
  "compareAtPrice": 60,
  "available": true,
  "variantsCount": 18,
  "optionNames": ["Color", "Size"],
  "featuredImage": "https://cdn.shopify.com/...jpg",
  "variants": [
    { "variantId": "987", "title": "Black / S", "sku": "GS-VSL-BLK-S", "price": 40, "available": true, "options": ["Black", "S"] }
  ],
  "scrapedAt": "2026-06-17T00:00:00.000Z"
}
```

### Pricing — Pay Per Event

| Event | Price |
|---|---|
| Actor start | $0.001 per run |
| Product scraped | $0.002 per product |

A 1,000‑product export costs about **$2.00**. You are charged per product returned, never per page or per variant, and billing stops at `maxProducts`. Blocked or empty stores are not billed as products.

### How it works

1. For each store, the actor calls `https://<store>/products.json?limit=250&page=N`, paginating until the catalog is exhausted or `maxProducts` is reached.
2. Collection URLs use `/collections/<handle>/products.json`; product URLs use `<product>.json`.
3. Store currency is read once per store from `/meta.json`.
4. Requests retry with exponential backoff on rate limits, and rotate proxy on each attempt.
5. Records are flattened to a stable, agent‑friendly schema and pushed to the dataset.

### Reliability

Because it uses Shopify's own public JSON endpoints rather than HTML scraping, output does not break when a store changes its theme. Stores that disable `products.json` or are password‑protected are detected and reported; a run that finds zero products across all targets fails honestly with a clear message instead of returning an empty dataset silently.

### MCP / AI agents

This actor is MCP‑ready. Expose it through the [Apify MCP server](https://mcp.apify.com) as `khadinakbar/shopify-products-scraper` and call it with a single store URL to get structured product JSON back. Input descriptions are written for an LLM to route correctly between the three modes.

### Related actors

- **shopify-all-in-one-scraper** — products + storefront intelligence in one actor.
- **google-shopping-scraper**, **walmart-data-extractor**, **ebay-all-in-one-scraper** — cross‑marketplace pricing.

### FAQ

**Does it need a Shopify API key or login?** No. It uses only public endpoints.

**Can it scrape any store?** Any standard Shopify storefront. A minority disable `products.json` or use a password page; those are skipped with a clear note.

**Does it get inventory quantities?** It returns per‑variant in‑stock availability (`available`). Exact stock counts are not exposed on the public endpoint.

**How fresh is the data?** Live at scrape time — the same data the store's own theme renders.

### Legal

This actor collects only publicly available product data from Shopify storefronts. It does not access private, authenticated, or personal data. You are responsible for ensuring your use complies with the target store's Terms of Service, robots directives, and applicable laws (including copyright and database rights). Use the data responsibly and lawfully.

# Actor input Schema

## `storeUrls` (type: `array`):

Shopify store domains or homepage URLs to scrape the ENTIRE catalog from, e.g. 'gymshark.com' or 'https://www.allbirds.com'. Each entry is crawled via its public /products.json endpoint, paginated until every product is collected. Add many entries to batch-scrape multiple stores in one run. NOT a single product or collection URL — use the fields below for those.

## `collectionUrls` (type: `array`):

Specific Shopify collection page URLs, e.g. 'https://www.gymshark.com/collections/leggings'. Only products inside that collection are scraped, paginated via /collections/<handle>/products.json. Use this when you want one category instead of the whole store. NOT a full store domain — use Store URLs for the entire catalog.

## `productUrls` (type: `array`):

Individual Shopify product page URLs, e.g. 'https://your-store.com/products/product-handle'. Each is fetched once via its .json endpoint and returns one detailed record. Use this for a targeted shortlist of known products. NOT a collection or store URL — those scrape many products.

## `maxProducts` (type: `integer`):

Hard cap on the total number of products scraped across the whole run, protecting your budget. Billing stops at this number. Defaults to 1000; set higher for large multi-store jobs. Counts products, not pages or stores.

## `includeVariants` (type: `boolean`):

When true, each product record includes a 'variants' array with per-variant price, SKU, availability, and options. Turn off to shrink output when you only need product-level data. Defaults to true. Does not change billing — you are charged per product, not per variant.

## `includeImages` (type: `boolean`):

When true, each product record includes an 'images' array of all image URLs. Turn off to keep records compact (a 'featuredImage' field is always present regardless). Defaults to true. Does not affect billing.

## `proxyConfiguration` (type: `object`):

Apify proxy settings used for all requests. Defaults to Apify Proxy (auto) which is plenty for the public products.json endpoint. Set a country or residential group only if a store geo-blocks requests. Leave as default for nearly all stores.

## Actor input object example

```json
{
  "storeUrls": [
    "gymshark.com",
    "allbirds.com"
  ],
  "collectionUrls": [
    "https://www.gymshark.com/collections/leggings"
  ],
  "productUrls": [
    "https://your-store.com/products/product-handle"
  ],
  "maxProducts": 1000,
  "includeVariants": true,
  "includeImages": true,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# Actor output Schema

## `products` (type: `string`):

Dataset of scraped Shopify products (one row per product).

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "storeUrls": [
        "gymshark.com"
    ],
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("khadinakbar/shopify-products-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "storeUrls": ["gymshark.com"],
    "proxyConfiguration": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("khadinakbar/shopify-products-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "storeUrls": [
    "gymshark.com"
  ],
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}' |
apify call khadinakbar/shopify-products-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=khadinakbar/shopify-products-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Shopify Products Scraper",
        "description": "Scrape every product from any Shopify store via the public products.json endpoint — prices, variants, images.",
        "version": "0.1",
        "x-build-id": "kVnFfmyueQHGPAqDH"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/khadinakbar~shopify-products-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-khadinakbar-shopify-products-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/khadinakbar~shopify-products-scraper/runs": {
            "post": {
                "operationId": "runs-sync-khadinakbar-shopify-products-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/khadinakbar~shopify-products-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-khadinakbar-shopify-products-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "storeUrls": {
                        "title": "Store URLs / domains",
                        "type": "array",
                        "description": "Shopify store domains or homepage URLs to scrape the ENTIRE catalog from, e.g. 'gymshark.com' or 'https://www.allbirds.com'. Each entry is crawled via its public /products.json endpoint, paginated until every product is collected. Add many entries to batch-scrape multiple stores in one run. NOT a single product or collection URL — use the fields below for those.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "collectionUrls": {
                        "title": "Collection URLs",
                        "type": "array",
                        "description": "Specific Shopify collection page URLs, e.g. 'https://www.gymshark.com/collections/leggings'. Only products inside that collection are scraped, paginated via /collections/<handle>/products.json. Use this when you want one category instead of the whole store. NOT a full store domain — use Store URLs for the entire catalog.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "productUrls": {
                        "title": "Product URLs",
                        "type": "array",
                        "description": "Individual Shopify product page URLs, e.g. 'https://your-store.com/products/product-handle'. Each is fetched once via its .json endpoint and returns one detailed record. Use this for a targeted shortlist of known products. NOT a collection or store URL — those scrape many products.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxProducts": {
                        "title": "Max products (total)",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Hard cap on the total number of products scraped across the whole run, protecting your budget. Billing stops at this number. Defaults to 1000; set higher for large multi-store jobs. Counts products, not pages or stores.",
                        "default": 1000
                    },
                    "includeVariants": {
                        "title": "Include variants",
                        "type": "boolean",
                        "description": "When true, each product record includes a 'variants' array with per-variant price, SKU, availability, and options. Turn off to shrink output when you only need product-level data. Defaults to true. Does not change billing — you are charged per product, not per variant.",
                        "default": true
                    },
                    "includeImages": {
                        "title": "Include images",
                        "type": "boolean",
                        "description": "When true, each product record includes an 'images' array of all image URLs. Turn off to keep records compact (a 'featuredImage' field is always present regardless). Defaults to true. Does not affect billing.",
                        "default": true
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Apify proxy settings used for all requests. Defaults to Apify Proxy (auto) which is plenty for the public products.json endpoint. Set a country or residential group only if a store geo-blocks requests. Leave as default for nearly all stores.",
                        "default": {
                            "useApifyProxy": true
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
