Shopify Scraper Pro avatar

Shopify Scraper Pro

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Shopify Scraper Pro

Shopify Scraper Pro

Scrape any Shopify store(s) at scale: products, collections, search, on-sale tracking, multi-store batches. Filters: price/vendor/type/tags/title/sale/availability/date. Multi-endpoint fallback. HTTP-only, no auth, no proxy.

Pricing

from $1.00 / 1,000 results

Rating

5.0

(13)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

13

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

The most capable Shopify catalog scraper on Apify Store. Multi-store batches, collection-scoped scraping, in-store search, sale tracking with discount %, endpoint fallback chain, image quality control, 13 filter dimensions, and automatic proxy fallback. Pulls structured product data from any Shopify-powered store via public APIs — no login, no cookies, no proxy required for most stores.

What's "Pro" about it

CapabilityBasic Shopify scrapersShopify Pro
Stores per run1N (multi-store batch)
Collection-scopedcollectionHandles
In-store searchsearchQuery via /search/suggest.json
Endpoint fallback/products.json only/products.json/collections/all/products.json → collection-specific
Sort options8 (best-selling, price asc/desc, title, created…)
Image qualityoriginal only7 sizes (small/medium/large/grande/1024/2048/master)
Filters0-213 dimensions
Sale tracking❌ or basiconSale, comparePrice, discountPercent (derived)
Date filterscreatedSince, updatedSince
SKU searchskuContains
Auto proxy fallback✅ retries with Apify proxy on first failure
Output fields~1727 (incl. discountPercent, maxPrice, availableVariantCount, storeCollections)

Modes

The actor runs in one of three modes per store:

  1. Full catalog (default) — walks /products.json (and falls back to /collections/all/products.json) until the catalog is exhausted
  2. Collection (collectionHandles=[...]) — fetches /collections/<handle>/products.json with the chosen sortBy
  3. Search (searchQuery=...) — uses /search/suggest.json to find the most relevant matches

Output per product

  • productId, handle, productUrl, storeUrl
  • title, vendor, productType
  • description (HTML stripped, capped at 5000 chars)
  • tags[]
  • price (lowest variant), maxPrice (when range), comparePrice, onSale, discountPercent (derived)
  • currency — when discoverable
  • variantCount, availableVariantCount, available
  • skus[] (up to 20)
  • variants[] (up to 30 — id, title, sku, price, comparePrice, available, option1/2/3)
  • imageUrl (cover, transformed to chosen imageQuality), images[] (up to 15, all transformed)
  • options[] (size/color etc — name, values[])
  • createdAt, updatedAt, publishedAt
  • collections[] — when enrichWithCollections=true and product matches store collections
  • storeCollections[] — flat list of all collections in the store
  • scrapedFromCollection — when scraped via collection mode
  • recordType: "product", scrapedAt

Empty fields are omitted from the output (no nulls).

Input

FieldTypeDefaultDescription
storeUrlsarray["allbirds.com","kith.com"]Shopify store domains. Accepts full URLs, bare domains, or *.myshopify.com handles
collectionHandlesarray[]When set, fetch /collections/<handle>/products.json instead of full catalog. Applies to ALL stores
searchQuerystringWhen set, runs in-store search via /search/suggest.json
sortByenumbest-sellingCollection mode sort: best-selling, manual, price-ascending, price-descending, title-ascending, title-descending, created-ascending, created-descending
imageQualityenummasterImage URL quality: small/medium/large/grande/1024x1024/2048x2048/master
enrichWithCollectionsboolfalseFetch each store's /collections.json to enrich products with collection metadata
maxItemsPerStoreint250Hard cap per store (1-5000)
maxItemsint1000Global hard cap (1-50000)
includeUnavailableboolfalseDrop products whose variants are all out-of-stock
minPricenumberDrop products below this price
maxPricenumberDrop products above this price
vendorContainsstringOnly emit products whose vendor contains this substring
productTypeContainsstringOnly emit products whose product_type contains this substring
tagAnyOfarray[]Only emit products whose tags contain at least one of these
tagNoneOfarray[]Drop products whose tags contain any of these
titleContainsstringOnly emit products whose title contains this substring
onSaleOnlyboolfalseOnly emit products with compare_at_price > price
minDiscountPercentintOnly emit products discounted by at least this %
createdSincestringDrop products created before this ISO date (YYYY-MM-DD)
updatedSincestringDrop products updated before this ISO date
skuContainsstringOnly emit products with at least one variant whose SKU contains this substring
useApifyProxyboolfalseForce proxy from start. Auto-fallback to proxy happens regardless
requestDelaySecsint1Pause between page fetches (0–10s)

Example: full catalog from one store

{
"storeUrls": ["allbirds.com"],
"maxItemsPerStore": 500,
"imageQuality": "large"
}

Example: men's sale collection sorted by price

{
"storeUrls": ["allbirds.com"],
"collectionHandles": ["mens-sale"],
"sortBy": "price-ascending",
"onSaleOnly": true,
"minDiscountPercent": 30,
"maxItems": 50
}

Example: in-store search across multiple stores

{
"storeUrls": ["allbirds.com", "rothys.com"],
"searchQuery": "wool runner",
"maxItems": 30
}

Example: track new arrivals from competitors

{
"storeUrls": ["kith.com", "endclothing.com", "ssense.com"],
"createdSince": "2026-04-01",
"tagAnyOf": ["new-arrivals", "new"],
"minPrice": 100,
"imageQuality": "1024x1024"
}

Example: sale tracker with rich output

{
"storeUrls": ["allbirds.com", "everlane.com", "rothys.com"],
"onSaleOnly": true,
"minDiscountPercent": 40,
"enrichWithCollections": true,
"maxItems": 200
}

Use cases

  • Competitor monitoring — pull a competitor's full catalog weekly, track new launches with createdSince
  • Sale tracking — daily run with onSaleOnly: true + minDiscountPercent: 30 to alert on deep discounts
  • Multi-brand aggregator — feed a price-comparison site with N store catalogs in one run
  • Inventory snapshot — combine availableVariantCount and updatedSince to track stock churn
  • Brand intelligencevendorContains to extract products from a specific brand carried by a multi-brand store
  • Product-feed migration — export a Shopify catalog as clean JSON for import elsewhere
  • In-store search analyticssearchQuery mode to see what each store surfaces for a keyword

FAQ

Does it require a login or cookies? No. The actor uses public Shopify storefront endpoints.

Is a proxy needed? Usually no — most Shopify stores serve datacenter IPs. The actor automatically retries with Apify proxy on the first failure, so stores that block cloud IPs still work transparently.

Which stores does it work on? Any store using Shopify with the public products feed enabled — the vast majority of the 5M+ live Shopify stores. Some stores explicitly disable /products.json; for those, the /collections/all/products.json fallback often still works.

How is discountPercent calculated? When a product's lowest variant price (price) is below the compare_at_price, we compute 100 * (1 - price/comparePrice) and round down. Only emitted when the value is between 1 and 99.

Why is currency not always included? Shopify's /products.json doesn't include a currency field at the product level. We add it when discoverable elsewhere; otherwise it's omitted (no nulls).

Can I scrape a specific collection? Yes — set collectionHandles: ["sale", "new-arrivals"]. The actor will hit /collections/<handle>/products.json with your chosen sortBy.

What does imageQuality actually do? Shopify's CDN serves multiple sizes via URL suffixes. e.g. abc.jpgabc_small.jpgabc_1024x1024.jpg. The actor rewrites every image URL to the chosen size. master returns the original.

How does searchQuery differ from titleContains? searchQuery calls Shopify's own search backend (returns ranked, server-side relevance). titleContains is a client-side substring filter applied AFTER fetching the catalog. Use searchQuery for relevance, titleContains for catalog scans.

How fresh is the data? Real-time. Endpoints serve the live catalog snapshot.

What's the difference between available and availableVariantCount? available is true when at least one variant is in stock. availableVariantCount is the integer count of in-stock variants. Both are reliable sale indicators.