Shopify Scraper avatar

Shopify Scraper

Pricing

from $0.90 / 1,000 results

Go to Apify Store
Shopify Scraper

Shopify Scraper

Universal Shopify Scraper for Apify — Extract product data from any Shopify store via API. Multi-store support, auto-detection, smart filtering by price/tags/sales, new arrival flags, sale detection. Handles pagination, configurable request delays, pure HTTP (no browser). Perfect for e-com research

Pricing

from $0.90 / 1,000 results

Rating

0.0

(0)

Developer

Rover Omniscraper

Rover Omniscraper

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

3

Monthly active users

5 days ago

Last modified

Share

Universal Shopify Scraper — Extract Product Data from Any Shopify Store

The fastest, most reliable way to scrape Shopify stores at scale. Extract complete product catalogs — prices, variants, inventory, images, sale status, and more — from any Shopify-powered store without a browser.

Built on Apify, the world's leading web scraping platform. No coding required. Export to JSON, CSV, or Excel in one click.


What is the Universal Shopify Scraper?

The Universal Shopify Scraper is an Apify Actor that extracts full product data from any Shopify store using Shopify's public /products.json API endpoint. It auto-detects Shopify stores, paginates through entire catalogs, normalizes the data into a clean schema, and delivers structured results ready for analysis.

Whether you need to monitor competitor pricing, track inventory levels, research market trends, or build a product feed — this actor was built for you.

💡 Works on any public Shopify store — no API key, store credentials, or browser required.


Why Use This Shopify Scraper?

FeatureThis ActorBrowser-Based Scrapers
Speed⚡ Pure HTTP — no browser overhead🐢 5–20× slower
Cost💰 $2.50 per 1,000 products💸 Much more expensive
Reliability✅ Does not break on JS changes❌ Breaks when UI changes
Data completeness✅ Full catalog including hidden fields⚠️ Only visible fields
Proxy-friendly✅ Apify proxy support built-inVariable

Key Features

  • Auto-detection — Automatically verifies if any URL is a Shopify store before scraping (JSON probe + HTML fingerprint fallback)
  • Multi-store scraping — Scrape multiple Shopify stores in a single actor run
  • Full catalog pagination — Fetches every product page (250 products/page) with automatic since_id cursor fallback for stores with 10,000+ products
  • Sale detection — Identifies discounted products where compare_at_price > price and computes the exact discount percentage per variant
  • New arrivals flag — Marks recently published products within a configurable time window (default: 30 days)
  • Smart filtering — Filter output by tags, price range, sale status, or in-stock availability before it hits your dataset
  • Collection targeting — Scrape a specific product collection instead of the full catalog
  • Proxy support — Integrates with Apify's residential and datacenter proxy network for high reliability
  • Batch output — Pushes data to the Apify dataset in efficient batches for large catalogs
  • Clean, normalized schema — Consistent field names across all stores; no raw Shopify fields to decode

Use Cases

🛍️ E-commerce Price Monitoring

Track pricing changes across competing Shopify stores. Get alerts when a competitor drops prices or launches a sale. Compare your pricing against the market in real time.

📊 Market Research & Trend Analysis

Extract entire product catalogs to analyze product assortment, vendor distribution, tag taxonomy, and pricing strategies. Find gaps in the market or spot trends in product launches.

🏷️ Sale & Discount Tracking

Filter for only on-sale products across multiple stores. The scraper computes the exact discountPercent per variant, so you can rank deals by savings instantly.

🆕 New Arrivals Intelligence

Monitor newly launched products with the isNewArrival flag. Great for fashion, sneakers, and streetwear industries where velocity of new drops matters.

📦 Inventory and Stock Monitoring

Track which variants are available or out of stock with availableVariants. Use this for restock alerts or to understand competitor supply chains.

🔗 Product Feed Generation

Build automated product feeds for affiliate marketing, comparison shopping engines, or internal product databases — refreshed on a schedule via Apify's scheduler.

🤖 AI & LLM Training Data

Collect normalized, clean product descriptions, tags, and metadata at scale for training product classification models or recommendation engines.


How It Works

Input URLs → Shopify Detection → Pagination → Normalization → Filtering → Dataset
  1. Input validation — Reads your list of store URLs and scraping preferences
  2. Shopify detection — Probes each URL for /products.json; falls back to HTML fingerprint scanning for cdn.shopify.com and Shopify.theme signatures
  3. Full pagination — Fetches products in pages of 250; auto-switches to since_id cursor-based pagination for very large stores (12,500+ products)
  4. Normalization — Transforms every raw Shopify product object into a clean, enriched record with sale detection, new-arrival flagging, and price aggregation
  5. Filtering — Applies your tag, price, sale, and stock filters server-side before pushing to the dataset
  6. Output — Pushes results in batches of 100 to your Apify dataset; exportable as JSON, CSV, JSONL, or Excel

Input Configuration

Required

FieldTypeDescription
storeUrlsstring[]One or more Shopify store URLs to scrape. Example: ["https://kith.com", "https://allbirds.com"]

Optional — Scope Control

FieldTypeDefaultDescription
collectionstring""Scrape only a specific collection by its handle (e.g., "new-arrivals"). Leave empty to scrape all products.
maxProductsinteger0Maximum number of products to scrape per store. 0 means unlimited — full catalog.

Optional — Filtering

FieldTypeDefaultDescription
filterTagsstring""Comma-separated list of tags. Only products with at least one matching tag are included.
minPricenumber0Exclude products whose lowest variant price is below this value. 0 disables the filter.
maxPricenumber0Exclude products whose lowest variant price is above this value. 0 disables the filter.
onlyOnSalebooleanfalseIf true, only include products with at least one variant on sale (compare_at_price > price).
onlyInStockbooleanfalseIf true, only include products with at least one available variant.

Optional — Behavior

FieldTypeDefaultDescription
newArrivalsDaysinteger30Number of days since publication to flag a product as isNewArrival: true.
requestDelayinteger500Milliseconds to wait between page requests. Increase to reduce rate-limiting risk.
proxyConfigobject{}Apify proxy configuration. Leave as default for automatic proxy handling.

Example Inputs

Scrape an entire store catalog

{
"storeUrls": ["https://gymshark.com"],
"requestDelay": 500
}

Scrape sale items only across multiple stores

{
"storeUrls": [
"https://kith.com",
"https://allbirds.com",
"https://gymshark.com"
],
"onlyOnSale": true,
"onlyInStock": true
}

Scrape a specific collection with price filtering

{
"storeUrls": ["https://kith.com"],
"collection": "footwear",
"minPrice": 50,
"maxPrice": 300,
"onlyInStock": true
}

Track new arrivals from a sneaker store

{
"storeUrls": ["https://kith.com"],
"newArrivalsDays": 7,
"filterTags": "sneakers,footwear",
"maxProducts": 200
}

Output Data Schema

Each scraped product is stored as a structured JSON object:

{
"store": "kith.com",
"id": 8286284742784,
"title": "VANS Premium Classic Slip-On - Indigo Blue",
"handle": "vn000d9pind1",
"url": "https://kith.com/products/vn000d9pind1",
"vendor": "VANS",
"productType": "Low Top Sneakers",
"description": "Premium Classic Slip-On silhouette. Canvas and leather upper. Vulcanized build with waffle rubber outsole.",
"tags": ["footwear", "sneakers", "mens", "low top sneakers"],
"publishedAt": "2026-04-07T15:50:28-04:00",
"isNewArrival": true,
"images": [
"https://cdn.shopify.com/s/files/1/0094/2252/files/vans-slip-on-hero.jpg"
],
"featuredImage": "https://cdn.shopify.com/s/files/1/0094/2252/files/vans-slip-on-hero.jpg",
"variants": [
{
"id": 45139595591808,
"title": "US 10 / Indigo Blue",
"sku": "10869433",
"price": 95.00,
"compareAtPrice": 120.00,
"onSale": true,
"discountPercent": 21,
"available": true,
"option1": "US 10",
"option2": "Indigo Blue",
"option3": null,
"weight": 0.5,
"weightUnit": "kg",
"requiresShipping": true
}
],
"lowestPrice": 85.00,
"highestPrice": 105.00,
"onSale": true,
"maxDiscountPercent": 21,
"totalVariants": 12,
"availableVariants": 8,
"scrapedAt": "2026-04-08T10:30:00.000Z"
}

Output Field Reference

FieldTypeDescription
storestringThe store's hostname (e.g., kith.com)
idnumberShopify product ID
titlestringProduct title
handlestringURL-safe product handle
urlstringDirect product page URL
vendorstringBrand or vendor name
productTypestringProduct category type
descriptionstringPlain-text description (HTML stripped)
tagsstring[]Array of product tags
publishedAtstringISO 8601 publication timestamp
isNewArrivalbooleantrue if published within newArrivalsDays
imagesstring[]All product image URLs
featuredImagestring|nullFirst (main) product image URL
variantsobject[]Full variant list (see below)
lowestPricenumber|nullLowest price across all variants
highestPricenumber|nullHighest price across all variants
onSalebooleantrue if any variant is on sale
maxDiscountPercentnumberHighest discount % across variants
totalVariantsnumberTotal number of variants
availableVariantsnumberNumber of in-stock variants
scrapedAtstringISO 8601 timestamp of when data was collected

Variant Fields

FieldTypeDescription
idnumberShopify variant ID
titlestringVariant title (e.g., "Small / Black")
skustringSKU code
pricenumber|nullCurrent price
compareAtPricenumber|nullOriginal price (before discount)
onSalebooleantrue if compareAtPrice > price
discountPercentnumberPercentage discount (0 if not on sale)
availablebooleanWhether this variant is in stock
option1string|nullFirst option value (e.g., size)
option2string|nullSecond option value (e.g., color)
option3string|nullThird option value
weightnumberItem weight
weightUnitstringWeight unit (kg, lb, etc.)
requiresShippingbooleanWhether the item requires shipping

Pricing

$2.50 per 1,000 products scraped.

The actor charges based on the number of products pushed to your dataset. There are no hidden costs, no per-store fees, and no minimum commitments. You only pay for what you use.

VolumeCost
1,000 products$2.50
10,000 products$25.00
100,000 products$250.00
1,000,000 products$2,500.00

Apify platform compute costs (memory/CPU) are billed separately and are typically minimal for this actor due to its pure HTTP architecture.


Proxy Configuration

This actor supports Apify Proxy for reliable, uninterrupted scraping.

{
"proxyConfig": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Proxy groups available:

  • RESIDENTIAL — Residential IPs (recommended for high-volume runs)
  • DATACENTER — Datacenter IPs (faster, sufficient for most Shopify stores)

Leave proxyConfig as {} (default) to use automatic proxy selection.


Performance

  • Speed: Typically 2,000–5,000 products per minute depending on store size and requestDelay
  • Memory: 256 MB default (sufficient for catalogs of any size)
  • Concurrency: Single-threaded per store (respects rate limits)
  • Pagination: Handles unlimited product counts via automatic since_id cursor fallback
  • Retries: Up to 3 retries per page with exponential backoff on failures or rate-limits (HTTP 429)

Frequently Asked Questions

Does this work on any Shopify store?

Yes — any public Shopify store that has the /products.json endpoint enabled (which is the default for all Shopify stores). Some stores may restrict this endpoint; the actor will skip those with a warning and continue.

Do I need a Shopify API key or login?

No. This actor uses Shopify's public product API, which requires no authentication. There is no login, no API key, and no store owner permissions needed.

How many stores can I scrape in one run?

As many as you want. Add all your target URLs to the storeUrls array and the actor will process them sequentially.

Will it get blocked?

The actor uses a configurable requestDelay (default 500ms) between page requests to behave like a polite crawler. For large-scale runs, enable Apify Proxy with RESIDENTIAL group for the best reliability.

How do I scrape only a specific product category?

Use the collection input field with the collection's URL handle. For example, if the collection URL is https://example.com/collections/new-arrivals, the handle is new-arrivals.

Can I schedule this to run automatically?

Yes. Use Apify's built-in scheduler to run this actor on any cron schedule — hourly, daily, weekly, or custom.

What export formats are available?

From the Apify dataset you can export: JSON, JSONL, CSV, Excel (XLSX), XML, and RSS — all without any additional configuration.

How fresh is the data?

The data is live — scraped directly from the store at the time the actor runs. There is no caching. Use Apify Scheduler for recurring freshness.

Does it handle stores with 10,000+ products?

Yes. Shopify's page-based pagination caps at 12,500 products (page=50 × limit=250). This actor automatically detects this limit and switches to since_id cursor-based pagination, enabling complete extraction of unlimited-size catalogs.


Supported Stores

This actor has been tested on 100+ popular Shopify stores including:

  • Fashion & Apparel: Kith, Gymshark, Allbirds, Kylie Cosmetics, Pura Vida
  • Footwear: Steve Madden, Vans, Steve Madden, many independent sneaker boutiques
  • Beauty & Wellness: ColourPop, Morphe, OUAI, Herbivore Botanicals
  • Home & Lifestyle: MVMT, Beardbrand, BarkBox
  • Food & Beverage: Death Wish Coffee, Jones Road Beauty

If a store runs on Shopify, this actor can scrape it.


This actor accesses only publicly available data through Shopify's documented public API endpoint (/products.json). It does not bypass authentication, access private data, or violate any platform security controls.

Users are responsible for ensuring their use of scraped data complies with:

  • The target website's Terms of Service
  • Applicable data protection laws (GDPR, CCPA, etc.)
  • Apify's Terms of Service

This tool is intended for legitimate business use cases such as market research, competitive analysis, price monitoring, and product feed generation.


About

Built with ❤️ as an Apify Actor. Open source, pure Node.js, no browser required.

  • Runtime: Node.js 20
  • HTTP client: got-scraping
  • Platform: Apify
  • Source: Available on Apify platform

Questions or issues? Use the Apify community forum or open a support ticket.

Local Development

# Install dependencies
npm install
# Create input file
echo '{"storeUrls":["https://kith.com"],"maxProducts":10}' > storage/key_value_stores/default/INPUT.json
# Run locally
npm start

Tech Stack

  • Node.js 20 (ES modules)
  • Apify SDK v3 — Actor lifecycle, dataset storage, proxy management
  • got-scraping — HTTP requests optimized for scraping
  • No browser / Puppeteer / Playwright — pure HTTP for speed and low cost

License

ISC