# Shopify Store Scraper Goat (`goat255/shopify-store-scraper`) Actor

Scrape products from any Shopify storefront without a login or API key. Pull an entire store catalog, a single collection, or one product. Walks pagination up to your chosen limit and returns clean, normalized product data with prices, variants, images, and tags.

- **URL**: https://apify.com/goat255/shopify-store-scraper.md
- **Developed by:** [Goutam Soni](https://apify.com/goat255) (community)
- **Categories:** E-commerce, Business
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Shopify Store Scraper

Extract products from any Shopify store without a login or API key. Pull an entire store catalog, every product in a single collection, or one specific product, and get clean, normalized product data with prices, variants, images, options, and tags. The scraper walks pagination automatically up to the limit you set, so you can collect hundreds or thousands of products from a store in a single run.

### What it does

- **Full store catalog** - every product in a Shopify store, with prices, variants, images, options, and tags.
- **Collection scrape** - all products inside a single Shopify collection.
- **Single product** - one product by its page URL or handle.
- **Automatic pagination** - store and collection modes walk multiple pages until your `maxProducts` is reached or the catalog runs out.
- **Normalized output** - one stable, importance-ordered shape per product, so every field is always present and easy to map.
- **No login, no API key** - point it at a public storefront URL and run.
- **Bulk and concurrent** - scrape many stores or collections in parallel in one run.

### Use cases

- **Competitor price monitoring** - track a market's pricing and assortment on a schedule and spot changes over time.
- **Product research and catalog feeds** - build a clean product dataset for search, comparison, or analytics.
- **New-arrival and restock alerts** - watch a store for fresh products, restocks, and price drops.
- **Lead generation** - collect vendor names and product ranges across a niche for outreach.
- **Catalog audits** - find products with missing images, empty descriptions, or out-of-stock variants.

### Input

| Field | Type | Description |
|---|---|---|
| `storeUrls` | array | Store domains or URLs to pull the full product catalog from. Bare domain or full URL, with or without `https`. Example: `https://example.com`, `example.com`. |
| `collectionUrls` | array | Collection page URLs to pull products from a single collection. Provide the full collection URL. Example: `https://example.com/collections/example-collection`. |
| `productUrls` | array | Product page URLs to fetch one product each. Example: `https://example.com/products/example-product`. |
| `maxProducts` | integer | Cap on products returned per store or collection. Default 1000. Pagination is walked until this is reached or the catalog is exhausted. |
| `concurrency` | integer | How many sources to process in parallel. Default 5. |
| `proxyConfig` | object | Apify proxy configuration. RESIDENTIAL is the default and recommended option for the most reliable results. |

At least one of `storeUrls`, `collectionUrls`, or `productUrls` is required.

#### Example input

```json
{
  "storeUrls": ["https://example.com", "another-example.com"],
  "collectionUrls": ["https://example.com/collections/example-collection"],
  "maxProducts": 2000,
  "concurrency": 3,
  "proxyConfig": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }
}
````

### Output

Each dataset item is one product with a clean, importance-ordered shape: identity first, then metrics, then content, then media, then metadata.

```json
{
  "storeDomain": "example.com",
  "id": "7222392750160",
  "handle": "example-product",
  "url": "https://example.com/products/example-product",
  "title": "Example Product",
  "vendor": "Acme Co",
  "productType": "Shoes",
  "priceMin": 37.0,
  "priceMax": 75.0,
  "compareAtPriceMax": 95.0,
  "available": true,
  "variantsCount": 12,
  "imagesCount": 5,
  "optionsCount": 1,
  "description": "Plain-text product description with HTML stripped out.",
  "tags": ["sale", "new-arrival"],
  "options": [{ "name": "Size", "values": ["S", "M", "L"] }],
  "variants": [
    {
      "id": "41360177758288",
      "title": "S",
      "sku": "EX-001",
      "price": 37.0,
      "compareAtPrice": 75.0,
      "available": false,
      "option1": "S",
      "option2": null,
      "option3": null,
      "grams": 433,
      "requiresShipping": true,
      "taxable": true
    }
  ],
  "featuredImage": "https://example.com/cdn/example.jpg",
  "images": ["https://example.com/cdn/example.jpg"],
  "createdAt": "2025-09-24T23:41:55.000Z",
  "updatedAt": "2026-06-16T19:07:22.000Z",
  "publishedAt": "2026-06-16T18:33:09.000Z"
}
```

#### Key fields

- **`url`, `handle`, `id`** - stable identifiers for joining or deduping.
- **`priceMin` / `priceMax`** - the price range across all variants. `compareAtPriceMax` is the highest compare-at price and is present only for products that are on sale.
- **`available`** - `true` when at least one variant is in stock.
- **`variants`** - per-variant SKU, price, stock, and option values.
- **`updatedAt` / `publishedAt`** - useful for change detection and new-arrival monitoring.

Every field is always present. Values that the store does not publish for a given product are returned as `null`.

### FAQ

**Is this scraper free? How is it priced?**
You pay per product returned, plus standard Apify platform usage. There is no per-run start fee. Check the pricing tab on the actor's Store page for the current rate.

**Do I need a login, password, or API key?**
No. The scraper reads public storefront data only. Just provide a store, collection, or product URL.

**How many products can I scrape per store?**
Set `maxProducts` to whatever you need. The scraper walks pagination across multiple pages automatically, so you can pull the full catalog (hundreds or thousands of products) in one run, bounded only by what the store exposes publicly.

**How fast is it?**
Most stores return a few hundred products per second over a clean connection. Run multiple stores or collections in parallel with the `concurrency` setting.

**Can I scrape many stores at once?**
Yes. Add multiple entries to `storeUrls` (and/or `collectionUrls` and `productUrls`) and they are processed concurrently in a single run.

**Why are some fields null?**
A field is `null` only when the store itself does not publish that value for a product (for example, `compareAtPriceMax` is set only on discounted items, and `productType` is sometimes left blank). The output shape is always complete so your downstream mapping never breaks.

### Notes

- A run uses the Apify proxy you select. RESIDENTIAL gives the most reliable results.
- If a source is temporarily unavailable, an item is returned with a generic status (`upstream_unavailable`, `upstream_rate_limit`, or `not_found`) so a single failure never stops the run.
- The default input is a health-check sentinel that returns a single confirmation record. Replace `storeUrls` with real store URLs to scrape.
- Pagination depth is bounded by what a store exposes for its public catalog.

# Actor input Schema

## `storeUrls` (type: `array`):

Store domains or URLs to pull the full product catalog from. Bare domain or full URL, with or without https. Example: https://example.com, example.com.

## `collectionUrls` (type: `array`):

Collection page URLs to pull products from a single collection. Provide the full collection URL. Example: https://example.com/collections/example-collection.

## `productUrls` (type: `array`):

Product page URLs to fetch one product each. Provide the full product URL. Example: https://example.com/products/example-product.

## `maxProducts` (type: `integer`):

Cap on products returned per store or collection. Pagination is walked across multiple pages until this is reached or the catalog is exhausted.

## `concurrency` (type: `integer`):

How many sources to process in parallel. Higher is faster but puts more load on proxies.

## `proxyConfig` (type: `object`):

Apify proxy. RESIDENTIAL is the default and recommended option for the most reliable results.

## Actor input object example

```json
{
  "storeUrls": [
    "__healthcheck__"
  ],
  "collectionUrls": [],
  "productUrls": [],
  "maxProducts": 1000,
  "concurrency": 5,
  "proxyConfig": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "storeUrls": [
        "__healthcheck__"
    ],
    "collectionUrls": [],
    "productUrls": [],
    "proxyConfig": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ]
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("goat255/shopify-store-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "storeUrls": ["__healthcheck__"],
    "collectionUrls": [],
    "productUrls": [],
    "proxyConfig": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
    },
}

# Run the Actor and wait for it to finish
run = client.actor("goat255/shopify-store-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "storeUrls": [
    "__healthcheck__"
  ],
  "collectionUrls": [],
  "productUrls": [],
  "proxyConfig": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}' |
apify call goat255/shopify-store-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=goat255/shopify-store-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Shopify Store Scraper Goat",
        "description": "Scrape products from any Shopify storefront without a login or API key. Pull an entire store catalog, a single collection, or one product. Walks pagination up to your chosen limit and returns clean, normalized product data with prices, variants, images, and tags.",
        "version": "0.1",
        "x-build-id": "JanN2hlvhDRm9KbyG"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/goat255~shopify-store-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-goat255-shopify-store-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/goat255~shopify-store-scraper/runs": {
            "post": {
                "operationId": "runs-sync-goat255-shopify-store-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/goat255~shopify-store-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-goat255-shopify-store-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "storeUrls": {
                        "title": "Store URLs (catalog mode)",
                        "type": "array",
                        "description": "Store domains or URLs to pull the full product catalog from. Bare domain or full URL, with or without https. Example: https://example.com, example.com.",
                        "default": [
                            "__healthcheck__"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "collectionUrls": {
                        "title": "Collection URLs (collection mode)",
                        "type": "array",
                        "description": "Collection page URLs to pull products from a single collection. Provide the full collection URL. Example: https://example.com/collections/example-collection.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "productUrls": {
                        "title": "Product URLs (single product mode)",
                        "type": "array",
                        "description": "Product page URLs to fetch one product each. Provide the full product URL. Example: https://example.com/products/example-product.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxProducts": {
                        "title": "Max products per source",
                        "minimum": 1,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Cap on products returned per store or collection. Pagination is walked across multiple pages until this is reached or the catalog is exhausted.",
                        "default": 1000
                    },
                    "concurrency": {
                        "title": "Concurrency",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "How many sources to process in parallel. Higher is faster but puts more load on proxies.",
                        "default": 5
                    },
                    "proxyConfig": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Apify proxy. RESIDENTIAL is the default and recommended option for the most reliable results.",
                        "default": {
                            "useApifyProxy": true,
                            "apifyProxyGroups": [
                                "RESIDENTIAL"
                            ]
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
