Shopify Products Scraper
Pricing
from $0.85 / 1,000 products
Shopify Products Scraper
Scrape every product from any Shopify store: title, vendor, price, compare-at price, variants, stock status, and images. Just enter the store domain. No API keys or category URLs needed. Export data, run via API, schedule and monitor runs, or integrate with other tools.
Pricing
from $0.85 / 1,000 products
Rating
5.0
(7)
Developer
Trove Vault
Maintained by CommunityActor stats
9
Bookmarked
178
Total users
51
Monthly active users
a day ago
Last modified
Categories
Share
Shopify Products Scraper: Full Catalogue from Any Store Domain
Shopify Products Scraper extracts a public Shopify store catalogue from one store domain. It calls Shopify's native /products.json storefront endpoint, paginates automatically, and returns one structured row per product with prices, sale status, stock flags, images, variants, tags, descriptions, and publish/update timestamps.
Use it when you need competitor product data, catalogue monitoring, price tracking, or multi-store comparison without collecting collection URLs or running a browser.
Why use this actor
Most Shopify product scrapers require collection URLs or product page URLs. This actor starts from the store domain, uses Shopify's public product feed, and returns a normalized dataset for CSV, Excel, BigQuery, dashboards, or downstream Apify actors.
| Capability | Collection scrapers | Browser scrapers | Shopify Products Scraper |
|---|---|---|---|
| Input needed | One URL per collection | Product page URLs | Store domain only |
| Catalogue coverage | Depends on supplied collections | Page by page | Automatic /products.json pagination |
| Multi-store runs | Usually separate runs | Usually separate runs | Multiple domains in one run |
| Speed and cost | Medium | Slower and costlier | Direct HTTP requests |
| Compare-at prices | Often missing | Unreliable | Native Shopify field |
| API key | Sometimes required | Not required | Not required |
Workflow
store domain -> Shopify /products.json pages -> normalized product rows -> Apify dataset/API
What it extracts
Each output row represents one product. Nullable fields are returned as null; arrays are returned as arrays.
| Field | What it means |
|---|---|
store | Domain that produced the row |
title | Product title |
vendor | Brand or vendor set by the merchant |
url | Product page URL |
featuredImage | First product image URL |
imageCount | Number of product images |
imageAltTexts | Non-empty image alt text |
currency | ISO 4217 currency when available |
priceMin, priceMax | Lowest and highest variant price |
compareAtPrice | Highest compare-at price, or null |
onSale | true when any variant has a compare-at price |
available | true when at least one variant is available |
fullyOutOfStock | true when every variant is unavailable |
requiresShipping | true when at least one variant requires shipping |
weightAndUnit | First variant weight in grams |
variantCount | Number of variants |
options | Variant option names, such as Size or Color |
productType | Merchant-defined product type |
tags | Shopify product tags |
description | Plain-text product description with HTML stripped |
publishedAt, updatedAt | Publish and update timestamps |
runId | Optional parent run ID copied from input |
Use cases
Competitor price monitoring
Track a rival Shopify catalogue over time. Schedule daily or weekly runs and compare priceMin, priceMax, compareAtPrice, and onSale to detect discounts, price increases, and promotion patterns.
Inventory and stock tracking
Monitor available and fullyOutOfStock across a store. Use the dataset to spot stockouts, restocks, or products that frequently sell out.
Product research and assortment analysis
Audit product counts, brands, product types, tags, variant counts, and price bands for category research, supplier checks, marketplace planning, and SKU benchmarking.
Multi-store comparison
Add several domains in one run. The store field is included on every row, so comparisons do not need extra joins.
New product detection
Run on a schedule and filter by publishedAt, or diff consecutive datasets, to identify new, removed, and changed listings.
How to scrape a Shopify store
- Enter one or more store domains, such as
gymshark.comordeathwishcoffee.com. - Set Max Products per store. Use
50to test,500for a sample, or0for the full public catalogue. - Leave proxy disabled unless a store returns HTTP 403.
- Start the actor and export the dataset as JSON, CSV, Excel, or through the Apify API.
Input
{"domains": ["gymshark.com", "deathwishcoffee.com"],"maxProducts": 500,"proxyConfiguration": { "useApifyProxy": false }}
| Field | Type | Default | Description |
|---|---|---|---|
domains | Array | required | Shopify store domains. Accepts bare hostnames, full URLs, and .myshopify.com subdomains. |
maxProducts | Number | 0 | Maximum products per store. 0 means no limit. Shopify pages are fetched in batches of 250. |
proxyConfiguration | Object | disabled | Proxy settings. Enable Apify Proxy Residential only when a domain returns HTTP 403. |
datasetId | String | optional | Existing Apify dataset ID to append rows to alongside the default dataset. |
runId | String | optional | Parent run ID copied into each output row for pipeline traceability. |
Output
The actor returns one dataset item per product.
{"store": "gymshark.com","vendor": "Gymshark","title": "Vital Seamless 2.0 Shorts","url": "https://gymshark.com/products/vital-seamless-2-0-shorts","featuredImage": "https://cdn.shopify.com/vital-seamless-shorts.jpg","imageCount": 6,"imageAltTexts": ["Vital Seamless 2.0 Shorts - Black", "Back view - Black"],"currency": "GBP","priceMin": 45,"priceMax": 45,"compareAtPrice": null,"onSale": false,"available": true,"fullyOutOfStock": false,"requiresShipping": true,"weightAndUnit": { "grams": 180 },"variantCount": 8,"options": ["Size"],"productType": "Shorts","tags": ["bottoms", "seamless", "training"],"description": "Crafted with seamless construction for a second-skin feel.","publishedAt": "2023-06-14T09:00:00Z","updatedAt": "2024-01-10T11:30:00Z"}
API examples
Trigger a run with curl:
curl -X POST "https://api.apify.com/v2/acts/trovevault~shopify-products-scraper/runs" \-H "Authorization: Bearer YOUR_APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"domains":["gymshark.com"],"maxProducts":500}'
Run with the JavaScript client and read the dataset:
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });const run = await client.actor('trovevault~shopify-products-scraper').call({domains: ['gymshark.com'],maxProducts: 500,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items.slice(0, 3));
Troubleshooting and support
HTTP 403 or IP blocked
Enable Apify Proxy with the Residential group in proxyConfiguration. Large brands sometimes block datacenter IP ranges.
HTTP 404 on /products.json
The actor tries three fallbacks: scanning the homepage for a .myshopify.com domain, trying the parent domain, and guessing the Shopify subdomain from the brand label. If all fail, the site may not be Shopify or may have disabled the endpoint.
No rows or fewer products than expected Shopify returns only the public catalogue exposed by the storefront. Draft, hidden, password-protected, wholesale-only, and feed-excluded products will not appear.
Currency is null
The actor reads currency from /cart.js. If a store blocks or customizes that endpoint, rows are still returned but price fields have no confirmed currency.
Network errors or timeouts
Retry with a lower maxProducts value. If the same domain keeps failing, enable Residential proxy and check the run log.
Need help Open the actor's Issues tab with the run ID, input domain, and log error.
FAQ
Does it work on every Shopify store? It works on stores that expose Shopify's public product feed. Some headless, password-protected, wholesale, or customized stores restrict it.
Does it return variant-level rows?
No. It returns one row per product. Variant data is summarized with variantCount, options, prices, availability, shipping, and weight.
Can I scrape several stores at once?
Yes. Add multiple domains to domains; each row includes store.
Can I append output to an existing dataset?
Yes. Pass datasetId to append rows to an existing dataset. Pass runId to link rows to a parent pipeline run.
Can I use this through an AI assistant? Yes. Use the Apify API, JavaScript client, or Apify MCP server from MCP-compatible assistants.
Is scraping Shopify product data legal? The actor requests public storefront endpoints. Accessing public data is generally lawful, but review the target site's terms and your own use case.
Limitations
- One row per product, not per variant or SKU.
- Only public Shopify catalogue data is returned.
- Some headless, password-protected, wholesale, or restricted stores may not expose
/products.json. compareAtPriceis only present when the merchant sets compare-at pricing.- Currency may be
nullwhen/cart.jsis unavailable. - The actor does not crawl non-Shopify platforms such as WooCommerce, Magento, BigCommerce, Amazon, or custom storefronts.
Related actors
- WooCommerce Products Scraper for WooCommerce product catalogues.
- E-Commerce Tech Stack Detector to identify a store platform before scraping.
Changelog
v0.1
- Full catalogue scraping from Shopify
/products.json - Automatic pagination and multi-store input
- Domain fallback for custom, regional, and
.myshopify.comdomains - Structured product output with prices, stock flags, images, variants, tags, and timestamps