Shopify Products Scraper - Catalog, Prices, SKU & Variants
Pricing
Pay per event
Shopify Products Scraper - Catalog, Prices, SKU & Variants
Scrape any Shopify store catalog via public /products.json. Title, SKU, price, variants, images, vendor, tags. No auth, no proxy, $5/1K products.
Pricing
Pay per event
Rating
0.0
(0)
Developer
deusex machine
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
10 hours ago
Last modified
Categories
Share
Shopify Products Scraper — Catalog, Prices, SKU, Variants & Inventory
Scrape any Shopify store's full catalog using the canonical /products.json endpoint that Shopify itself exposes publicly on every store. Get full product data — title, SKU, price, variants, images, vendor, product type, tags — in seconds. HTTP-only, no browser, no proxy, no auth. $5 per 1,000 products.
If you run a dropshipping operation, sell on Amazon arbitraging from DTC brands, build a price-comparison tool, audit competitor catalogs, train a recommendation model on ecommerce data, or detect which domains in your sales pipeline run on Shopify, this Shopify scraper turns 4+ million Shopify stores into a structured JSON feed.
Why use this Shopify scraper
Every Shopify store exposes its catalog at the public canonical endpoint /products.json — the same endpoint that powers Shopify's own product pickers, search APIs and feed-export integrations. The data is canonical, complete and SEO-indexable by design.
This actor pulls that endpoint directly. That means:
- ✅ Official-grade reliability — when Shopify changes their UI, the JSON endpoint stays
- ✅ Complete product metadata — every variant, every image, every tag, full inventory and pricing data
- ✅ No anti-bot —
/products.jsonis unauthenticated by design and Shopify wants it indexed - ✅ Fast — paginate 250 products per page, typically 1-2 seconds per page, ~125-250 products per second
- ✅ Cheap — $0.005 per product ($5 per 1,000), the lowest among Shopify scrapers in the Apify Store
What this Shopify scraper extracts
Per product (type: "product")
| Field | Description | Example |
|---|---|---|
productId | Shopify internal product ID | 4521803710580 |
storeUrl | Canonical store URL | https://www.allbirds.com |
productUrl | Direct product page URL | https://www.allbirds.com/products/trino-cozy-crew-heathered-onyx |
handle | URL-safe product slug | trino-cozy-crew-heathered-onyx |
title | Product title | Trino® Cozy Crew - Heathered Onyx |
vendor | Brand / vendor name | Allbirds |
productType | Shopify product type taxonomy | Socks |
tags | Comma-separated tags (often rich custom metadata) | 'collection:apr26, ygroup_trino-cozy-crew, ...' |
image | Featured high-res image URL | https://cdn.shopify.com/... |
imagesCount | Total images attached | 2 |
price | Headline price (first variant) | 24.00 |
compareAtPrice | MSRP / "original" price | 30.00 |
discountPercent | Calculated discount % when compareAtPrice > price | 20.0 |
variantCount | Number of variants (sizes/colors/etc) | 4 |
variants | Full variants array (see below) | [{...}, ...] |
inventoryQuantity | Aggregated stock across variants when exposed | 351 |
available | True if any variant is in stock | true |
options | Shopify product options (Size, Color, Material...) | [{name: "Size", values: [...]}] |
description | Full HTML body (only when includeProductDetails: true) | <p>Soft merino...</p> |
createdAt / publishedAt / updatedAt | ISO timestamps | 2026-04-21T12:39:05-07:00 |
scrapedAt | UTC timestamp of extraction | 2026-05-18T22:01:14+00:00 |
Per variant (inside variants[])
| Field | Description |
|---|---|
id | Variant ID |
title | Variant label (often "Size / Color") |
sku | Stock-keeping unit (your inventory primary key) |
price / compareAtPrice | Per-variant pricing |
available | Boolean in-stock flag |
inventoryQuantity | Per-variant stock (when published by the store) |
weight / weightUnit | Shipping weight (e.g. 0.3087 kg) |
requiresShipping | Physical good vs digital |
option1 / option2 / option3 | Variant attribute values matching product options |
Per detection (type: "detection")
When you pass domainsToDetect, the actor returns one record per domain:
| Field | Example |
|---|---|
domain | https://www.allbirds.com |
isShopify | true |
productsEndpointStatus | 200 (or 404 if blocked / not Shopify) |
sampleProductTitle | Trino® Cozy Crew - Heathered Onyx |
productsOnPage1 | 1 (sample size used for detection) |
Per collections snapshot (type: "collections_snapshot")
When includeCollections: true, one extra record per store with the collection catalog (id, title, handle, productsCount, updatedAt, image). Useful for building category trees in your own app.
Use cases for this Shopify data API
🛒 Dropshipping research
Build a target list of US/EU DTC brands that match your niche. Pull their catalog daily and identify products you can resell on Amazon, eBay or your own Shopify store with a margin.
Recipe:
- Use
domainsToDetecton a list of niche-related domains → keep onlyisShopify: trueones - Schedule
storeUrlsscrape weekly to refresh the catalog - Filter by
discountPercent > 30%to find arbitrage candidates
💼 B2B sales prospecting Shopify merchants
Tools like Klaviyo, Recharge, Yotpo, Gorgias, Postscript and 200+ other Shopify-app vendors need a target list of Shopify stores. Pass any list of domains; the detector marks which ones are Shopify, and the catalog scrape qualifies each by product count and SKU breadth.
📊 Competitive intelligence
How many products does your competitor have? At what price points? How fast do they add new SKUs? Schedule this scraper on competitors and diff snapshots over time.
Metrics you can derive:
- New SKUs/week — innovation velocity
- Price changes — track repositioning
- Inventory shifts — early signal of bestsellers and dying products
- Tag/category mix — strategic positioning
🤖 Recommendation engine training
Ecommerce recommender systems need a clean product catalog with consistent metadata. Pull 100k products from 100 stores in your niche and train collaborative filtering on tags, productType and vendor as features.
📈 Price comparison apps
Build a "find the cheapest" tool for fashion, beauty, supplements, home goods. Schedule daily scrapes of 50-200 competitors per category; expose the lowest price + direct link as a freemium feature.
🧠 LLM grounding for shopping assistants
Build a shopping assistant powered by your favourite LLM that can recommend exact products with real prices and in-stock availability — grounded in fresh Shopify data, not hallucinated.
🏷️ Tag & taxonomy intelligence
Shopify stores embed rich custom metadata in tags (e.g. Allbirds uses allbirds::material => wool, allbirds::carbon-score => 5.9). Aggregate tags across hundreds of stores in a niche to map the implicit taxonomy.
How to use this Shopify scraper
The actor supports two input modes — combine them in a single run.
Mode 1: Catalog scrape (core)
Pass one or more Shopify store URLs. The actor paginates through /products.json until the catalog is exhausted or maxProductsPerStore is reached. Up to 250 products per page; up to 200 pages per store (50,000 products absolute ceiling per store).
{"storeUrls": ["https://www.allbirds.com","https://www.deathwishcoffee.com","https://www.brooklinen.com"],"maxProductsPerStore": 500,"maxTotalProducts": 1500}
Mode 2: Shopify detection
Pass any list of domains. The actor returns one record per domain with isShopify: true/false. Useful for tech-stack auditing of your sales prospects.
{"domainsToDetect": ["https://www.allbirds.com","https://www.amazon.com","https://www.nike.com","https://www.gymshark.com"]}
Detail enrichment
Set includeProductDetails: true to additionally fetch /products/{handle}.json for every product extracted. This adds the full HTML description and richer per-variant data (per-variant inventory, weight, requires_shipping). Trade-off: one extra HTTP call per product, so a 1,000-product run takes ~3-5 minutes instead of ~10 seconds.
{"storeUrls": ["https://www.allbirds.com"],"includeProductDetails": true,"maxProductsPerStore": 100}
Collections snapshot
Set includeCollections: true to also pull /collections.json and emit one record per store with the full collection catalog. Useful for building a category tree of the store.
Step-by-step tutorial — your first Shopify run in 90 seconds
- Click "Try for free" on this actor's Apify Store page. New users get $5 platform credit.
- Paste a starter input for Allbirds (~991 products):
{"storeUrls": ["https://www.allbirds.com"],"maxProductsPerStore": 100,"maxTotalProducts": 100}
- Click "Start". The actor calls
/products.json?page=1&limit=250and pushes 100 records. - Download as JSON, CSV, Excel, XML, RSS or HTML.
Total runtime: ~10 seconds for 100 products.
Performance and cost
- HTTP only. No Playwright, no proxy, no rotation needed.
- ~125-250 products per second sustained, single worker, 256 MB memory.
- No anti-bot encountered on
/products.jsonor/products/{handle}.jsonendpoints across hundreds of Shopify stores tested. - Pricing: $0.005 per product + $0.00005 per actor start.
Pricing scenarios
| Workload | Products | Cost |
|---|---|---|
| Try Allbirds (100 products) | 100 | $0.50 |
| One Apify free $5 credit | ~1,000 | $5.00 |
| Full Allbirds catalog (~1,000) | 1,000 | $5.00 |
| Top 10 DTC brands × 500 products each | 5,000 | $25.00 |
| Niche audit (50 stores × 200 each) | 10,000 | $50.00 |
| Daily refresh of 30 competitors × 500 products | ~15,000/day | $75/day |
Output example (one product, list mode)
{"type": "product","productId": 4521803710580,"storeUrl": "https://www.allbirds.com","productUrl": "https://www.allbirds.com/products/trino-cozy-crew-heathered-onyx","handle": "trino-cozy-crew-heathered-onyx","title": "Trino® Cozy Crew - Heathered Onyx","vendor": "Allbirds","productType": "Socks","tags": "allbirds::carbon-score => 5.9, allbirds::material => wool, ...","image": "https://cdn.shopify.com/s/files/.../cozy-crew.jpg","imagesCount": 2,"price": 24.00,"compareAtPrice": null,"discountPercent": null,"variantCount": 4,"variants": [{"id": 32094182146100,"title": "S","sku": "PCC1HONU301","price": 24.00,"available": true,"inventoryQuantity": 351,"weight": 0.3087,"weightUnit": "kg","option1": "S"}],"inventoryQuantity": 1402,"available": true,"options": [{"name": "Size", "values": ["S","M","L","XL"]}],"publishedAt": "2026-04-21T12:39:05-07:00","updatedAt": "2026-05-18T15:57:03-07:00","scrapedAt": "2026-05-18T22:01:14+00:00"}
How this Shopify scraper compares
| Approach | Pros | Cons |
|---|---|---|
| This actor | Canonical endpoint, full variants + tags, $5/1K, no proxy, 2 modes (catalog + detect) | Doesn't expose sales numbers (Shopify hides those) |
| Shopify Partner App API | Real-time webhooks | Requires Partner account + each store must install your app |
| BuiltWith / Wappalyzer paid | Tech-stack DB | $$$$ subscription; no catalog data |
| Other Apify Shopify scrapers | Existed first | Mostly single-mode (no detection), often parse HTML instead of using the JSON endpoint, slower |
| Manual storefront scraping | Total flexibility | Brittle to theme changes; HTML parsing instead of JSON |
| Hiring a freelancer | Custom output | $200-$800 one-off, not maintained |
How to call this Shopify scraper from your code
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("makework36/shopify-products-scraper").call(run_input={"storeUrls": ["https://www.allbirds.com"],"maxProductsPerStore": 500,})for p in client.dataset(run["defaultDatasetId"]).iterate_items():if p["type"] == "product":print(p["title"], p["price"], p["available"], p["inventoryQuantity"])
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('makework36/shopify-products-scraper').call({storeUrls: ['https://www.deathwishcoffee.com'],includeProductDetails: true,});const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach(p => p.type === 'product' && console.log(p.title, p.price));
cURL (synchronous)
curl -X POST "https://api.apify.com/v2/acts/makework36~shopify-products-scraper/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"storeUrls":["https://www.allbirds.com"],"maxProductsPerStore":50}'
Frequently Asked Questions about scraping Shopify
Is scraping Shopify legal?
The /products.json endpoint is exposed publicly by Shopify on every store, intentionally, for SEO and product-feed integrations. This actor consumes the same public endpoint that Shopify's own merchandising tools use. You are responsible for respecting each store's product copyrights when redistributing data, and the Shopify Terms of Service for commercial uses derived from it.
Do all Shopify stores expose /products.json?
Most do — roughly 70-80% in our testing. A small percentage of stores either disable the endpoint via robots.txt, restrict it via theme code, or sit behind aggressive Cloudflare bot challenges. For stores that block it, you will see a 404 or 403 in the actor log and the detection record will show isShopify: false. This is a real limitation of public-endpoint scraping.
Why doesn't the output include sales numbers or revenue?
Shopify does not expose sold_quantity or revenue on the public /products.json endpoint. Aggregate inventory and "available" flags are exposed when the merchant enables inventory tracking. For real revenue/sales data, you need the merchant to install a Shopify app with the Orders/Reports scopes (which is a different model entirely).
How current is the data?
Live — every run hits the store directly. There is no actor-side cache. The updatedAt field on each product tells you when the merchant last edited it.
Can I find Shopify stores I don't know about?
Not from /products.json alone. The actor's detection mode confirms whether a domain you supply is Shopify, but does not enumerate unknown stores. For discovery, combine this actor with: BuiltWith Free (manual), Wappalyzer browser extension, Google search site:myshopify.com, or paid services like Storeleads or MyIP.ms.
Does the actor work for *.myshopify.com subdomain URLs too?
Yes. Some smaller stores use the default Shopify subdomain instead of a custom domain. Pass https://example.myshopify.com and the actor handles it identically.
How do I get full product descriptions?
Set includeProductDetails: true. The description field will then contain the full HTML body (often 1-10 KB of rich product copy + bullet lists + tables). Without this flag, descriptions are null to keep runs fast and cheap.
What about images in multiple sizes?
The list endpoint returns one featured image URL via the actor's image field plus imagesCount. The detail mode (includeProductDetails: true) preserves the full images array. Shopify offers image transformations via URL parameters (e.g. _500x.jpg) for thumbnail variants.
Can I schedule this scraper?
Yes. Use Apify's built-in scheduler to refresh your catalog snapshots daily, hourly or weekly. Push results to Google Sheets, BigQuery, Postgres, Snowflake or Slack via Apify integrations.
Will my IP get banned?
We've not observed IP blocks on /products.json in our tests. The endpoint is unauthenticated and designed to handle high traffic from merchandising tools. The actor inserts a 300 ms polite delay between paginated requests.
How do you handle pagination?
The actor calls /products.json?limit=250&page=N and increments N until the response returns 0 products or maxProductsPerStore is reached. The hard safety cap is 200 pages × 250 = 50,000 products per store.
Can I export only changed products since last run?
Not in v1. You can post-process by comparing updatedAt against your last-run timestamp. A native incremental mode is on the roadmap for v1.1.
Is there a free trial?
Apify gives every new user $5 in platform credit — enough to extract ~1,000 Shopify products with this actor.
Can I use this for tag analytics?
Absolutely. The tags field is one of the highest-signal data points in Shopify catalogs. Many stores embed structured metadata in tags (material, carbon score, season, audience, kit identifiers). Aggregate tags across a niche and you have a free taxonomy.
What if Shopify changes the endpoint?
Shopify has maintained /products.json as a public canonical endpoint since 2015. Removing it would break product-feed integrations across millions of stores and is extremely unlikely. If they ever modify it, this actor will be updated.
🔗 Other actors by makework36
Building ecommerce intelligence, dropshipping or B2B-sales tooling? You'll also want these:
- IndiaMART Suppliers Scraper — India B2B suppliers with phone, GST verified & ratings
- Goodreads Scraper — books, authors, ratings, ISBN
- Substack Scraper — newsletter posts, authors, reactions
- Email Finder Scraper — verified business emails by domain
- Reddit SaaS Leads Scraper — startup pain points & buyers
- Trustpilot Reviews Scraper — customer reviews & ratings
See all actors by makework36 on the Apify Store.
Roadmap
- v1.1: incremental mode (only emit products with
updatedAt > lastRunAt). - v1.2: bulk detection helper (pass 1,000 domains, parallel detect).
- v1.3: collection-scoped catalog (scrape only specific collections per store).
- v2: enhanced detection via theme markers when
/products.jsonis blocked.
Disclaimer
This actor consumes the public /products.json endpoint that every Shopify store exposes by design — the same endpoint Shopify's own merchandising tools and product-feed integrations use. You are responsible for respecting each merchant's copyright on product copy, photography and brand assets, the Shopify Terms of Service, and applicable data protection regulation when storing, transforming or redistributing the data.
🙏 Ran this Shopify scraper successfully? Leaving a review helps the Apify algorithm surface this actor to other ecommerce teams and dropshippers. Much appreciated.