# Shopify Products Scraper (`scrapemesh/shopify-products-scraper`) Actor

- **URL**: https://apify.com/scrapemesh/shopify-products-scraper.md
- **Developed by:** [ScrapeMesh](https://apify.com/scrapemesh) (community)
- **Categories:** Automation, Lead generation, E-commerce
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.99 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Shopify Products Scraper

A powerful Apify Actor that extracts comprehensive product data from Shopify stores. This actor automatically discovers product URLs from store pages and retrieves detailed product information including prices, variants, descriptions, images, and metadata in structured JSON format.

### Why Choose Us?

- **Automatic Product Discovery**: Automatically finds all product URLs from Shopify store pages - no need to manually list products
- **Complete Product Data**: Extracts full product information including variants, prices, images, descriptions, and metadata
- **Smart Proxy Management**: Intelligent proxy fallback system ensures reliable data extraction even when stores implement blocking
- **Bulk Processing**: Process multiple Shopify stores simultaneously with efficient concurrent requests
- **Live Data Saving**: Results are saved in real-time, so you don't lose data if the actor is interrupted
- **Production Ready**: Built with robust error handling, retry logic, and detailed logging for reliable operation

### Key Features

#### 🔍 **Automatic Product Discovery**
- Scans store HTML to automatically find all product URLs containing `/products/`
- Handles both absolute and relative URLs
- Deduplicates product links automatically

#### 📊 **Comprehensive Data Extraction**
- Product details: ID, title, description, vendor, product type
- Pricing information: current price, compare-at price, currency
- Variants: all product variants with sizes, colors, SKUs, inventory
- Images: product images with URLs, dimensions, and variant associations
- Metadata: tags, creation dates, update timestamps, published status
- Full JSON API response preserved for maximum data completeness

#### 🔄 **Intelligent Proxy Fallback**
- **Default**: Starts with no proxy for direct connection
- **Automatic Fallback**: If blocked (403/429), automatically switches to datacenter proxy
- **Residential Proxy**: If datacenter fails, falls back to residential proxy with 3 retries
- **Sticky Proxy**: Once residential proxy is activated, uses it for all remaining requests
- **Clear Logging**: All proxy switches and retries are logged for transparency

#### ⚡ **Performance Optimized**
- Asynchronous processing for fast concurrent requests
- Efficient HTML parsing with BeautifulSoup
- Live data saving prevents data loss
- Progress tracking with detailed logs

#### 🛡️ **Reliable & Robust**
- Comprehensive error handling for network issues
- Automatic retry logic for failed requests
- Graceful handling of missing or malformed data
- Detailed logging for monitoring and debugging

### Input

The actor accepts the following input parameters:

#### JSON Example

```json
{
  "startUrls": [
    "https://lootcrate.com",
    "https://www.decathlon.com"
  ],
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
````

#### Input Fields

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| **startUrls** | `array` | ✅ Yes | List of Shopify store URLs to scrape. Supports bulk input. Each URL should be a valid Shopify store homepage (e.g., `https://lootcrate.com`). |
| **proxyConfiguration** | `object` | ❌ No | Proxy settings. By default, no proxy is used. If the platform blocks requests, the actor automatically falls back to datacenter proxy, then residential proxy with 3 retries. |

#### Input Details

- **startUrls**:
  - Accepts one or more Shopify store URLs
  - Each URL should be the store's homepage
  - The actor will automatically discover all product pages
  - Example: `["https://lootcrate.com", "https://www.decathlon.com"]`

- **proxyConfiguration**:
  - Optional proxy configuration
  - Default: `{"useApifyProxy": false}` (no proxy)
  - If enabled, uses Apify's proxy infrastructure
  - Automatic fallback ensures reliable data extraction

### Output

The actor outputs structured product data grouped by store URL. Each product includes complete information from Shopify's JSON API.

#### Output Structure

```json
{
  "https://lootcrate.com": {
    "total_found": 5,
    "processed": 5,
    "successful": 5,
    "products": [
      {
        "url": "https://lootcrate.com/products/loot-crate",
        "json_url": "https://lootcrate.com/products/loot-crate.json",
        "data": {
          "product": {
            "id": 5083963261059,
            "title": "Loot Crate",
            "body_html": "<p>Product description...</p>",
            "vendor": "Loot Crate Core",
            "product_type": "Subscription Box",
            "created_at": "2020-07-07T14:17:32-07:00",
            "handle": "loot-crate",
            "updated_at": "2025-12-28T22:57:43-08:00",
            "published_at": "2023-03-09T06:53:59-08:00",
            "tags": "Subscription, Collectibles, Pop Culture",
            "variants": [
              {
                "id": 34197535719555,
                "product_id": 5083963261059,
                "title": "S / XS",
                "price": "29.99",
                "compare_at_price": "24.99",
                "sku": "1010126US",
                "inventory_management": "shopify",
                "weight": 0.0,
                "weight_unit": "lb",
                "requires_shipping": true
              }
            ],
            "images": [
              {
                "id": 123456789,
                "product_id": 5083963261059,
                "src": "https://cdn.shopify.com/...",
                "width": 2000,
                "height": 2000,
                "alt": "Product image"
              }
            ]
          }
        }
      }
    ]
  }
}
```

#### Output Fields

| Field | Description |
|-------|-------------|
| **store\_url** | The Shopify store URL that was scraped |
| **total\_found** | Total number of product URLs discovered on the store |
| **processed** | Number of products processed |
| **successful** | Number of products successfully extracted |
| **products** | Array of product objects, each containing: |
| - **url** | Direct product page URL |
| - **json\_url** | Shopify JSON API endpoint URL |
| - **data** | Complete product data from Shopify API including: |
|   - Product ID, title, description, vendor, type |
|   - Pricing and variants information |
|   - Product images and metadata |
|   - Tags, dates, and all other product attributes |

### 🚀 How to Use the Actor (via Apify Console)

1. **Log in** to [Apify Console](https://console.apify.com) and navigate to **Actors**
2. **Find** the `shopify-products-scraper` actor and click on it
3. **Configure inputs**:
   - Add one or more Shopify store URLs in the `startUrls` field
   - Optionally configure proxy settings (default: no proxy with automatic fallback)
4. **Run the actor** by clicking the "Start" button
5. **Monitor progress** in real-time through the detailed logs:
   - Product discovery progress
   - Proxy usage and fallback events
   - Success/failure counts for each product
6. **Access results** in the **OUTPUT** tab:
   - View data in the structured table view
   - Export results as JSON or CSV
   - Download the complete dataset

#### Example Usage

**Input:**

```json
{
  "startUrls": ["https://lootcrate.com"]
}
```

**Result:**

- Automatically discovers all products on the store
- Extracts complete product data for each item
- Groups results by store URL
- Provides summary statistics (total found, processed, successful)

### Best Use Cases

#### 🛒 **E-commerce Intelligence**

- Monitor competitor product catalogs and pricing
- Track product availability and inventory changes
- Analyze product categories and trends across multiple stores

#### 📊 **Market Research**

- Gather product data for market analysis
- Compare product offerings across different Shopify stores
- Study pricing strategies and product positioning

#### 🔄 **Data Integration**

- Import product catalogs into your own systems
- Sync product data for affiliate programs
- Build product comparison engines

#### 📈 **Business Intelligence**

- Track product launches and updates
- Monitor vendor and product type distributions
- Analyze product metadata and tagging strategies

#### 🎯 **Price Monitoring**

- Track price changes over time
- Monitor compare-at prices and discounts
- Analyze pricing across variants

### Frequently Asked Questions

#### How does the actor discover products?

The actor automatically scans the store's HTML for links containing `/products/` and extracts all product URLs. No manual product listing is required.

#### What happens if a store blocks my requests?

The actor implements intelligent proxy fallback:

1. Starts with no proxy (direct connection)
2. If blocked → automatically switches to datacenter proxy
3. If still blocked → falls back to residential proxy with 3 retries
4. Once residential proxy is activated, it's used for all remaining requests

#### Can I scrape multiple stores at once?

Yes! Simply add multiple URLs to the `startUrls` array. The actor processes them sequentially, grouping results by store URL.

#### What data is included in the output?

The actor extracts complete product data from Shopify's JSON API, including:

- Basic info (ID, title, description, vendor, type)
- Pricing (current price, compare-at price, currency)
- Variants (all sizes, colors, SKUs, inventory)
- Images (URLs, dimensions, variant associations)
- Metadata (tags, dates, published status)
- And all other fields available in Shopify's product API

#### How long does scraping take?

Scraping time depends on:

- Number of stores to process
- Number of products per store
- Network speed and proxy performance
- Store response times

The actor processes products concurrently for faster results. Progress is logged in real-time.

#### Can I limit the number of products scraped?

Currently, the actor processes all discovered products. You can filter results after scraping, or modify the code to add a `maxItems` limit if needed.

#### What if a product URL returns an error?

The actor handles errors gracefully:

- Failed products are logged with error details
- Successful products are still saved
- Summary statistics show success/failure counts
- The actor continues processing remaining products

#### Is the data saved in real-time?

Yes! The actor uses Apify's dataset feature to save data as it's extracted. This means:

- Data is available even if the actor is interrupted
- You can monitor progress in real-time
- Results are automatically saved to the dataset

### Support and Feedback

💬 **For custom solutions or feature requests**, contact us at: **dev.scraperengine@gmail.com**

We're always looking to improve the actor based on user feedback. If you encounter any issues or have suggestions for new features, please don't hesitate to reach out!

### Cautions

⚠️ **Important Legal and Ethical Considerations:**

- **Public Data Only**: This actor collects data only from publicly available Shopify store pages. It does not access private accounts, password-protected content, or restricted areas.

- **Respect Robots.txt**: While the actor can access public product pages, users should respect website terms of service and robots.txt files.

- **Rate Limiting**: The actor includes built-in delays and proxy management to avoid overwhelming target servers. However, users should be mindful of scraping frequency.

- **Legal Compliance**: Users are responsible for ensuring their use of this actor complies with:
  - Local data protection and privacy laws (GDPR, CCPA, etc.)
  - Website terms of service
  - Copyright and intellectual property regulations
  - Anti-spam and data collection regulations

- **Ethical Use**: This tool is intended for legitimate business intelligence, market research, and data analysis purposes. Users should not use it for:
  - Harassment or stalking
  - Unauthorized data collection
  - Violating terms of service
  - Any illegal activities

- **Data Responsibility**: Users are responsible for how they store, use, and share the collected data. Ensure proper data security and privacy practices.

***

**Built with ❤️ for the Apify community**

# Actor input Schema

## `startUrls` (type: `array`):

List one or more Shopify store URLs (e.g., https://lootcrate.com, https://www.decathlon.com). Supports bulk input.

## `proxyConfiguration` (type: `object`):

Choose which proxies to use. By default, no proxy is used. If the platform rejects or blocks the request, it will automatically fallback to datacenter proxy, then residential proxy with 3 retries.

## Actor input object example

```json
{
  "startUrls": [
    "https://lootcrate.com"
  ],
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        "https://lootcrate.com"
    ],
    "proxyConfiguration": {
        "useApifyProxy": false
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("scrapemesh/shopify-products-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": ["https://lootcrate.com"],
    "proxyConfiguration": { "useApifyProxy": False },
}

# Run the Actor and wait for it to finish
run = client.actor("scrapemesh/shopify-products-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    "https://lootcrate.com"
  ],
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}' |
apify call scrapemesh/shopify-products-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=scrapemesh/shopify-products-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Shopify Products Scraper",
        "version": "0.1",
        "x-build-id": "uGvBBwRmCJmfy2UoE"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/scrapemesh~shopify-products-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-scrapemesh-shopify-products-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/scrapemesh~shopify-products-scraper/runs": {
            "post": {
                "operationId": "runs-sync-scrapemesh-shopify-products-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/scrapemesh~shopify-products-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-scrapemesh-shopify-products-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Shopify Store URLs",
                        "type": "array",
                        "description": "List one or more Shopify store URLs (e.g., https://lootcrate.com, https://www.decathlon.com). Supports bulk input.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Choose which proxies to use. By default, no proxy is used. If the platform rejects or blocks the request, it will automatically fallback to datacenter proxy, then residential proxy with 3 retries."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
