Shopify Products Scraper - Catalog, Prices, SKU & Variants avatar

Shopify Products Scraper - Catalog, Prices, SKU & Variants

Pricing

Pay per event

Go to Apify Store
Shopify Products Scraper - Catalog, Prices, SKU & Variants

Shopify Products Scraper - Catalog, Prices, SKU & Variants

Scrape any Shopify store catalog via public /products.json. Title, SKU, price, variants, images, vendor, tags. No auth, no proxy, $5/1K products.

Pricing

Pay per event

Rating

0.0

(0)

Developer

deusex machine

deusex machine

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

10 hours ago

Last modified

Share

Shopify Products Scraper — Catalog, Prices, SKU, Variants & Inventory

Scrape any Shopify store's full catalog using the canonical /products.json endpoint that Shopify itself exposes publicly on every store. Get full product data — title, SKU, price, variants, images, vendor, product type, tags — in seconds. HTTP-only, no browser, no proxy, no auth. $5 per 1,000 products.

If you run a dropshipping operation, sell on Amazon arbitraging from DTC brands, build a price-comparison tool, audit competitor catalogs, train a recommendation model on ecommerce data, or detect which domains in your sales pipeline run on Shopify, this Shopify scraper turns 4+ million Shopify stores into a structured JSON feed.

Why use this Shopify scraper

Every Shopify store exposes its catalog at the public canonical endpoint /products.json — the same endpoint that powers Shopify's own product pickers, search APIs and feed-export integrations. The data is canonical, complete and SEO-indexable by design.

This actor pulls that endpoint directly. That means:

  • Official-grade reliability — when Shopify changes their UI, the JSON endpoint stays
  • Complete product metadata — every variant, every image, every tag, full inventory and pricing data
  • No anti-bot/products.json is unauthenticated by design and Shopify wants it indexed
  • Fast — paginate 250 products per page, typically 1-2 seconds per page, ~125-250 products per second
  • Cheap — $0.005 per product ($5 per 1,000), the lowest among Shopify scrapers in the Apify Store

What this Shopify scraper extracts

Per product (type: "product")

FieldDescriptionExample
productIdShopify internal product ID4521803710580
storeUrlCanonical store URLhttps://www.allbirds.com
productUrlDirect product page URLhttps://www.allbirds.com/products/trino-cozy-crew-heathered-onyx
handleURL-safe product slugtrino-cozy-crew-heathered-onyx
titleProduct titleTrino® Cozy Crew - Heathered Onyx
vendorBrand / vendor nameAllbirds
productTypeShopify product type taxonomySocks
tagsComma-separated tags (often rich custom metadata)'collection:apr26, ygroup_trino-cozy-crew, ...'
imageFeatured high-res image URLhttps://cdn.shopify.com/...
imagesCountTotal images attached2
priceHeadline price (first variant)24.00
compareAtPriceMSRP / "original" price30.00
discountPercentCalculated discount % when compareAtPrice > price20.0
variantCountNumber of variants (sizes/colors/etc)4
variantsFull variants array (see below)[{...}, ...]
inventoryQuantityAggregated stock across variants when exposed351
availableTrue if any variant is in stocktrue
optionsShopify product options (Size, Color, Material...)[{name: "Size", values: [...]}]
descriptionFull HTML body (only when includeProductDetails: true)<p>Soft merino...</p>
createdAt / publishedAt / updatedAtISO timestamps2026-04-21T12:39:05-07:00
scrapedAtUTC timestamp of extraction2026-05-18T22:01:14+00:00

Per variant (inside variants[])

FieldDescription
idVariant ID
titleVariant label (often "Size / Color")
skuStock-keeping unit (your inventory primary key)
price / compareAtPricePer-variant pricing
availableBoolean in-stock flag
inventoryQuantityPer-variant stock (when published by the store)
weight / weightUnitShipping weight (e.g. 0.3087 kg)
requiresShippingPhysical good vs digital
option1 / option2 / option3Variant attribute values matching product options

Per detection (type: "detection")

When you pass domainsToDetect, the actor returns one record per domain:

FieldExample
domainhttps://www.allbirds.com
isShopifytrue
productsEndpointStatus200 (or 404 if blocked / not Shopify)
sampleProductTitleTrino® Cozy Crew - Heathered Onyx
productsOnPage11 (sample size used for detection)

Per collections snapshot (type: "collections_snapshot")

When includeCollections: true, one extra record per store with the collection catalog (id, title, handle, productsCount, updatedAt, image). Useful for building category trees in your own app.

Use cases for this Shopify data API

🛒 Dropshipping research

Build a target list of US/EU DTC brands that match your niche. Pull their catalog daily and identify products you can resell on Amazon, eBay or your own Shopify store with a margin.

Recipe:

  1. Use domainsToDetect on a list of niche-related domains → keep only isShopify: true ones
  2. Schedule storeUrls scrape weekly to refresh the catalog
  3. Filter by discountPercent > 30% to find arbitrage candidates

💼 B2B sales prospecting Shopify merchants

Tools like Klaviyo, Recharge, Yotpo, Gorgias, Postscript and 200+ other Shopify-app vendors need a target list of Shopify stores. Pass any list of domains; the detector marks which ones are Shopify, and the catalog scrape qualifies each by product count and SKU breadth.

📊 Competitive intelligence

How many products does your competitor have? At what price points? How fast do they add new SKUs? Schedule this scraper on competitors and diff snapshots over time.

Metrics you can derive:

  • New SKUs/week — innovation velocity
  • Price changes — track repositioning
  • Inventory shifts — early signal of bestsellers and dying products
  • Tag/category mix — strategic positioning

🤖 Recommendation engine training

Ecommerce recommender systems need a clean product catalog with consistent metadata. Pull 100k products from 100 stores in your niche and train collaborative filtering on tags, productType and vendor as features.

📈 Price comparison apps

Build a "find the cheapest" tool for fashion, beauty, supplements, home goods. Schedule daily scrapes of 50-200 competitors per category; expose the lowest price + direct link as a freemium feature.

🧠 LLM grounding for shopping assistants

Build a shopping assistant powered by your favourite LLM that can recommend exact products with real prices and in-stock availability — grounded in fresh Shopify data, not hallucinated.

🏷️ Tag & taxonomy intelligence

Shopify stores embed rich custom metadata in tags (e.g. Allbirds uses allbirds::material => wool, allbirds::carbon-score => 5.9). Aggregate tags across hundreds of stores in a niche to map the implicit taxonomy.

How to use this Shopify scraper

The actor supports two input modes — combine them in a single run.

Mode 1: Catalog scrape (core)

Pass one or more Shopify store URLs. The actor paginates through /products.json until the catalog is exhausted or maxProductsPerStore is reached. Up to 250 products per page; up to 200 pages per store (50,000 products absolute ceiling per store).

{
"storeUrls": [
"https://www.allbirds.com",
"https://www.deathwishcoffee.com",
"https://www.brooklinen.com"
],
"maxProductsPerStore": 500,
"maxTotalProducts": 1500
}

Mode 2: Shopify detection

Pass any list of domains. The actor returns one record per domain with isShopify: true/false. Useful for tech-stack auditing of your sales prospects.

{
"domainsToDetect": [
"https://www.allbirds.com",
"https://www.amazon.com",
"https://www.nike.com",
"https://www.gymshark.com"
]
}

Detail enrichment

Set includeProductDetails: true to additionally fetch /products/{handle}.json for every product extracted. This adds the full HTML description and richer per-variant data (per-variant inventory, weight, requires_shipping). Trade-off: one extra HTTP call per product, so a 1,000-product run takes ~3-5 minutes instead of ~10 seconds.

{
"storeUrls": ["https://www.allbirds.com"],
"includeProductDetails": true,
"maxProductsPerStore": 100
}

Collections snapshot

Set includeCollections: true to also pull /collections.json and emit one record per store with the full collection catalog. Useful for building a category tree of the store.

Step-by-step tutorial — your first Shopify run in 90 seconds

  1. Click "Try for free" on this actor's Apify Store page. New users get $5 platform credit.
  2. Paste a starter input for Allbirds (~991 products):
    {
    "storeUrls": ["https://www.allbirds.com"],
    "maxProductsPerStore": 100,
    "maxTotalProducts": 100
    }
  3. Click "Start". The actor calls /products.json?page=1&limit=250 and pushes 100 records.
  4. Download as JSON, CSV, Excel, XML, RSS or HTML.

Total runtime: ~10 seconds for 100 products.

Performance and cost

  • HTTP only. No Playwright, no proxy, no rotation needed.
  • ~125-250 products per second sustained, single worker, 256 MB memory.
  • No anti-bot encountered on /products.json or /products/{handle}.json endpoints across hundreds of Shopify stores tested.
  • Pricing: $0.005 per product + $0.00005 per actor start.

Pricing scenarios

WorkloadProductsCost
Try Allbirds (100 products)100$0.50
One Apify free $5 credit~1,000$5.00
Full Allbirds catalog (~1,000)1,000$5.00
Top 10 DTC brands × 500 products each5,000$25.00
Niche audit (50 stores × 200 each)10,000$50.00
Daily refresh of 30 competitors × 500 products~15,000/day$75/day

Output example (one product, list mode)

{
"type": "product",
"productId": 4521803710580,
"storeUrl": "https://www.allbirds.com",
"productUrl": "https://www.allbirds.com/products/trino-cozy-crew-heathered-onyx",
"handle": "trino-cozy-crew-heathered-onyx",
"title": "Trino® Cozy Crew - Heathered Onyx",
"vendor": "Allbirds",
"productType": "Socks",
"tags": "allbirds::carbon-score => 5.9, allbirds::material => wool, ...",
"image": "https://cdn.shopify.com/s/files/.../cozy-crew.jpg",
"imagesCount": 2,
"price": 24.00,
"compareAtPrice": null,
"discountPercent": null,
"variantCount": 4,
"variants": [
{
"id": 32094182146100,
"title": "S",
"sku": "PCC1HONU301",
"price": 24.00,
"available": true,
"inventoryQuantity": 351,
"weight": 0.3087,
"weightUnit": "kg",
"option1": "S"
}
],
"inventoryQuantity": 1402,
"available": true,
"options": [{"name": "Size", "values": ["S","M","L","XL"]}],
"publishedAt": "2026-04-21T12:39:05-07:00",
"updatedAt": "2026-05-18T15:57:03-07:00",
"scrapedAt": "2026-05-18T22:01:14+00:00"
}

How this Shopify scraper compares

ApproachProsCons
This actorCanonical endpoint, full variants + tags, $5/1K, no proxy, 2 modes (catalog + detect)Doesn't expose sales numbers (Shopify hides those)
Shopify Partner App APIReal-time webhooksRequires Partner account + each store must install your app
BuiltWith / Wappalyzer paidTech-stack DB$$$$ subscription; no catalog data
Other Apify Shopify scrapersExisted firstMostly single-mode (no detection), often parse HTML instead of using the JSON endpoint, slower
Manual storefront scrapingTotal flexibilityBrittle to theme changes; HTML parsing instead of JSON
Hiring a freelancerCustom output$200-$800 one-off, not maintained

How to call this Shopify scraper from your code

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("makework36/shopify-products-scraper").call(run_input={
"storeUrls": ["https://www.allbirds.com"],
"maxProductsPerStore": 500,
})
for p in client.dataset(run["defaultDatasetId"]).iterate_items():
if p["type"] == "product":
print(p["title"], p["price"], p["available"], p["inventoryQuantity"])

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('makework36/shopify-products-scraper').call({
storeUrls: ['https://www.deathwishcoffee.com'],
includeProductDetails: true,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(p => p.type === 'product' && console.log(p.title, p.price));

cURL (synchronous)

curl -X POST "https://api.apify.com/v2/acts/makework36~shopify-products-scraper/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"storeUrls":["https://www.allbirds.com"],"maxProductsPerStore":50}'

Frequently Asked Questions about scraping Shopify

The /products.json endpoint is exposed publicly by Shopify on every store, intentionally, for SEO and product-feed integrations. This actor consumes the same public endpoint that Shopify's own merchandising tools use. You are responsible for respecting each store's product copyrights when redistributing data, and the Shopify Terms of Service for commercial uses derived from it.

Do all Shopify stores expose /products.json?

Most do — roughly 70-80% in our testing. A small percentage of stores either disable the endpoint via robots.txt, restrict it via theme code, or sit behind aggressive Cloudflare bot challenges. For stores that block it, you will see a 404 or 403 in the actor log and the detection record will show isShopify: false. This is a real limitation of public-endpoint scraping.

Why doesn't the output include sales numbers or revenue?

Shopify does not expose sold_quantity or revenue on the public /products.json endpoint. Aggregate inventory and "available" flags are exposed when the merchant enables inventory tracking. For real revenue/sales data, you need the merchant to install a Shopify app with the Orders/Reports scopes (which is a different model entirely).

How current is the data?

Live — every run hits the store directly. There is no actor-side cache. The updatedAt field on each product tells you when the merchant last edited it.

Can I find Shopify stores I don't know about?

Not from /products.json alone. The actor's detection mode confirms whether a domain you supply is Shopify, but does not enumerate unknown stores. For discovery, combine this actor with: BuiltWith Free (manual), Wappalyzer browser extension, Google search site:myshopify.com, or paid services like Storeleads or MyIP.ms.

Does the actor work for *.myshopify.com subdomain URLs too?

Yes. Some smaller stores use the default Shopify subdomain instead of a custom domain. Pass https://example.myshopify.com and the actor handles it identically.

How do I get full product descriptions?

Set includeProductDetails: true. The description field will then contain the full HTML body (often 1-10 KB of rich product copy + bullet lists + tables). Without this flag, descriptions are null to keep runs fast and cheap.

What about images in multiple sizes?

The list endpoint returns one featured image URL via the actor's image field plus imagesCount. The detail mode (includeProductDetails: true) preserves the full images array. Shopify offers image transformations via URL parameters (e.g. _500x.jpg) for thumbnail variants.

Can I schedule this scraper?

Yes. Use Apify's built-in scheduler to refresh your catalog snapshots daily, hourly or weekly. Push results to Google Sheets, BigQuery, Postgres, Snowflake or Slack via Apify integrations.

Will my IP get banned?

We've not observed IP blocks on /products.json in our tests. The endpoint is unauthenticated and designed to handle high traffic from merchandising tools. The actor inserts a 300 ms polite delay between paginated requests.

How do you handle pagination?

The actor calls /products.json?limit=250&page=N and increments N until the response returns 0 products or maxProductsPerStore is reached. The hard safety cap is 200 pages × 250 = 50,000 products per store.

Can I export only changed products since last run?

Not in v1. You can post-process by comparing updatedAt against your last-run timestamp. A native incremental mode is on the roadmap for v1.1.

Is there a free trial?

Apify gives every new user $5 in platform credit — enough to extract ~1,000 Shopify products with this actor.

Can I use this for tag analytics?

Absolutely. The tags field is one of the highest-signal data points in Shopify catalogs. Many stores embed structured metadata in tags (material, carbon score, season, audience, kit identifiers). Aggregate tags across a niche and you have a free taxonomy.

What if Shopify changes the endpoint?

Shopify has maintained /products.json as a public canonical endpoint since 2015. Removing it would break product-feed integrations across millions of stores and is extremely unlikely. If they ever modify it, this actor will be updated.

🔗 Other actors by makework36

Building ecommerce intelligence, dropshipping or B2B-sales tooling? You'll also want these:

See all actors by makework36 on the Apify Store.

Roadmap

  • v1.1: incremental mode (only emit products with updatedAt > lastRunAt).
  • v1.2: bulk detection helper (pass 1,000 domains, parallel detect).
  • v1.3: collection-scoped catalog (scrape only specific collections per store).
  • v2: enhanced detection via theme markers when /products.json is blocked.

Disclaimer

This actor consumes the public /products.json endpoint that every Shopify store exposes by design — the same endpoint Shopify's own merchandising tools and product-feed integrations use. You are responsible for respecting each merchant's copyright on product copy, photography and brand assets, the Shopify Terms of Service, and applicable data protection regulation when storing, transforming or redistributing the data.

🙏 Ran this Shopify scraper successfully? Leaving a review helps the Apify algorithm surface this actor to other ecommerce teams and dropshippers. Much appreciated.