Shopify Scraper
Pricing
from $0.90 / 1,000 results
Shopify Scraper
Universal Shopify Scraper for Apify — Extract product data from any Shopify store via API. Multi-store support, auto-detection, smart filtering by price/tags/sales, new arrival flags, sale detection. Handles pagination, configurable request delays, pure HTTP (no browser). Perfect for e-com research
Pricing
from $0.90 / 1,000 results
Rating
0.0
(0)
Developer
Rover Omniscraper
Actor stats
0
Bookmarked
4
Total users
3
Monthly active users
5 days ago
Last modified
Categories
Share
Universal Shopify Scraper — Extract Product Data from Any Shopify Store
The fastest, most reliable way to scrape Shopify stores at scale. Extract complete product catalogs — prices, variants, inventory, images, sale status, and more — from any Shopify-powered store without a browser.
Built on Apify, the world's leading web scraping platform. No coding required. Export to JSON, CSV, or Excel in one click.
What is the Universal Shopify Scraper?
The Universal Shopify Scraper is an Apify Actor that extracts full product data from any Shopify store using Shopify's public /products.json API endpoint. It auto-detects Shopify stores, paginates through entire catalogs, normalizes the data into a clean schema, and delivers structured results ready for analysis.
Whether you need to monitor competitor pricing, track inventory levels, research market trends, or build a product feed — this actor was built for you.
💡 Works on any public Shopify store — no API key, store credentials, or browser required.
Why Use This Shopify Scraper?
| Feature | This Actor | Browser-Based Scrapers |
|---|---|---|
| Speed | ⚡ Pure HTTP — no browser overhead | 🐢 5–20× slower |
| Cost | 💰 $2.50 per 1,000 products | 💸 Much more expensive |
| Reliability | ✅ Does not break on JS changes | ❌ Breaks when UI changes |
| Data completeness | ✅ Full catalog including hidden fields | ⚠️ Only visible fields |
| Proxy-friendly | ✅ Apify proxy support built-in | Variable |
Key Features
- Auto-detection — Automatically verifies if any URL is a Shopify store before scraping (JSON probe + HTML fingerprint fallback)
- Multi-store scraping — Scrape multiple Shopify stores in a single actor run
- Full catalog pagination — Fetches every product page (250 products/page) with automatic
since_idcursor fallback for stores with 10,000+ products - Sale detection — Identifies discounted products where
compare_at_price > priceand computes the exact discount percentage per variant - New arrivals flag — Marks recently published products within a configurable time window (default: 30 days)
- Smart filtering — Filter output by tags, price range, sale status, or in-stock availability before it hits your dataset
- Collection targeting — Scrape a specific product collection instead of the full catalog
- Proxy support — Integrates with Apify's residential and datacenter proxy network for high reliability
- Batch output — Pushes data to the Apify dataset in efficient batches for large catalogs
- Clean, normalized schema — Consistent field names across all stores; no raw Shopify fields to decode
Use Cases
🛍️ E-commerce Price Monitoring
Track pricing changes across competing Shopify stores. Get alerts when a competitor drops prices or launches a sale. Compare your pricing against the market in real time.
📊 Market Research & Trend Analysis
Extract entire product catalogs to analyze product assortment, vendor distribution, tag taxonomy, and pricing strategies. Find gaps in the market or spot trends in product launches.
🏷️ Sale & Discount Tracking
Filter for only on-sale products across multiple stores. The scraper computes the exact discountPercent per variant, so you can rank deals by savings instantly.
🆕 New Arrivals Intelligence
Monitor newly launched products with the isNewArrival flag. Great for fashion, sneakers, and streetwear industries where velocity of new drops matters.
📦 Inventory and Stock Monitoring
Track which variants are available or out of stock with availableVariants. Use this for restock alerts or to understand competitor supply chains.
🔗 Product Feed Generation
Build automated product feeds for affiliate marketing, comparison shopping engines, or internal product databases — refreshed on a schedule via Apify's scheduler.
🤖 AI & LLM Training Data
Collect normalized, clean product descriptions, tags, and metadata at scale for training product classification models or recommendation engines.
How It Works
Input URLs → Shopify Detection → Pagination → Normalization → Filtering → Dataset
- Input validation — Reads your list of store URLs and scraping preferences
- Shopify detection — Probes each URL for
/products.json; falls back to HTML fingerprint scanning forcdn.shopify.comandShopify.themesignatures - Full pagination — Fetches products in pages of 250; auto-switches to
since_idcursor-based pagination for very large stores (12,500+ products) - Normalization — Transforms every raw Shopify product object into a clean, enriched record with sale detection, new-arrival flagging, and price aggregation
- Filtering — Applies your tag, price, sale, and stock filters server-side before pushing to the dataset
- Output — Pushes results in batches of 100 to your Apify dataset; exportable as JSON, CSV, JSONL, or Excel
Input Configuration
Required
| Field | Type | Description |
|---|---|---|
storeUrls | string[] | One or more Shopify store URLs to scrape. Example: ["https://kith.com", "https://allbirds.com"] |
Optional — Scope Control
| Field | Type | Default | Description |
|---|---|---|---|
collection | string | "" | Scrape only a specific collection by its handle (e.g., "new-arrivals"). Leave empty to scrape all products. |
maxProducts | integer | 0 | Maximum number of products to scrape per store. 0 means unlimited — full catalog. |
Optional — Filtering
| Field | Type | Default | Description |
|---|---|---|---|
filterTags | string | "" | Comma-separated list of tags. Only products with at least one matching tag are included. |
minPrice | number | 0 | Exclude products whose lowest variant price is below this value. 0 disables the filter. |
maxPrice | number | 0 | Exclude products whose lowest variant price is above this value. 0 disables the filter. |
onlyOnSale | boolean | false | If true, only include products with at least one variant on sale (compare_at_price > price). |
onlyInStock | boolean | false | If true, only include products with at least one available variant. |
Optional — Behavior
| Field | Type | Default | Description |
|---|---|---|---|
newArrivalsDays | integer | 30 | Number of days since publication to flag a product as isNewArrival: true. |
requestDelay | integer | 500 | Milliseconds to wait between page requests. Increase to reduce rate-limiting risk. |
proxyConfig | object | {} | Apify proxy configuration. Leave as default for automatic proxy handling. |
Example Inputs
Scrape an entire store catalog
{"storeUrls": ["https://gymshark.com"],"requestDelay": 500}
Scrape sale items only across multiple stores
{"storeUrls": ["https://kith.com","https://allbirds.com","https://gymshark.com"],"onlyOnSale": true,"onlyInStock": true}
Scrape a specific collection with price filtering
{"storeUrls": ["https://kith.com"],"collection": "footwear","minPrice": 50,"maxPrice": 300,"onlyInStock": true}
Track new arrivals from a sneaker store
{"storeUrls": ["https://kith.com"],"newArrivalsDays": 7,"filterTags": "sneakers,footwear","maxProducts": 200}
Output Data Schema
Each scraped product is stored as a structured JSON object:
{"store": "kith.com","id": 8286284742784,"title": "VANS Premium Classic Slip-On - Indigo Blue","handle": "vn000d9pind1","url": "https://kith.com/products/vn000d9pind1","vendor": "VANS","productType": "Low Top Sneakers","description": "Premium Classic Slip-On silhouette. Canvas and leather upper. Vulcanized build with waffle rubber outsole.","tags": ["footwear", "sneakers", "mens", "low top sneakers"],"publishedAt": "2026-04-07T15:50:28-04:00","isNewArrival": true,"images": ["https://cdn.shopify.com/s/files/1/0094/2252/files/vans-slip-on-hero.jpg"],"featuredImage": "https://cdn.shopify.com/s/files/1/0094/2252/files/vans-slip-on-hero.jpg","variants": [{"id": 45139595591808,"title": "US 10 / Indigo Blue","sku": "10869433","price": 95.00,"compareAtPrice": 120.00,"onSale": true,"discountPercent": 21,"available": true,"option1": "US 10","option2": "Indigo Blue","option3": null,"weight": 0.5,"weightUnit": "kg","requiresShipping": true}],"lowestPrice": 85.00,"highestPrice": 105.00,"onSale": true,"maxDiscountPercent": 21,"totalVariants": 12,"availableVariants": 8,"scrapedAt": "2026-04-08T10:30:00.000Z"}
Output Field Reference
| Field | Type | Description |
|---|---|---|
store | string | The store's hostname (e.g., kith.com) |
id | number | Shopify product ID |
title | string | Product title |
handle | string | URL-safe product handle |
url | string | Direct product page URL |
vendor | string | Brand or vendor name |
productType | string | Product category type |
description | string | Plain-text description (HTML stripped) |
tags | string[] | Array of product tags |
publishedAt | string | ISO 8601 publication timestamp |
isNewArrival | boolean | true if published within newArrivalsDays |
images | string[] | All product image URLs |
featuredImage | string|null | First (main) product image URL |
variants | object[] | Full variant list (see below) |
lowestPrice | number|null | Lowest price across all variants |
highestPrice | number|null | Highest price across all variants |
onSale | boolean | true if any variant is on sale |
maxDiscountPercent | number | Highest discount % across variants |
totalVariants | number | Total number of variants |
availableVariants | number | Number of in-stock variants |
scrapedAt | string | ISO 8601 timestamp of when data was collected |
Variant Fields
| Field | Type | Description |
|---|---|---|
id | number | Shopify variant ID |
title | string | Variant title (e.g., "Small / Black") |
sku | string | SKU code |
price | number|null | Current price |
compareAtPrice | number|null | Original price (before discount) |
onSale | boolean | true if compareAtPrice > price |
discountPercent | number | Percentage discount (0 if not on sale) |
available | boolean | Whether this variant is in stock |
option1 | string|null | First option value (e.g., size) |
option2 | string|null | Second option value (e.g., color) |
option3 | string|null | Third option value |
weight | number | Item weight |
weightUnit | string | Weight unit (kg, lb, etc.) |
requiresShipping | boolean | Whether the item requires shipping |
Pricing
$2.50 per 1,000 products scraped.
The actor charges based on the number of products pushed to your dataset. There are no hidden costs, no per-store fees, and no minimum commitments. You only pay for what you use.
| Volume | Cost |
|---|---|
| 1,000 products | $2.50 |
| 10,000 products | $25.00 |
| 100,000 products | $250.00 |
| 1,000,000 products | $2,500.00 |
Apify platform compute costs (memory/CPU) are billed separately and are typically minimal for this actor due to its pure HTTP architecture.
Proxy Configuration
This actor supports Apify Proxy for reliable, uninterrupted scraping.
{"proxyConfig": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Proxy groups available:
RESIDENTIAL— Residential IPs (recommended for high-volume runs)DATACENTER— Datacenter IPs (faster, sufficient for most Shopify stores)
Leave proxyConfig as {} (default) to use automatic proxy selection.
Performance
- Speed: Typically 2,000–5,000 products per minute depending on store size and
requestDelay - Memory: 256 MB default (sufficient for catalogs of any size)
- Concurrency: Single-threaded per store (respects rate limits)
- Pagination: Handles unlimited product counts via automatic
since_idcursor fallback - Retries: Up to 3 retries per page with exponential backoff on failures or rate-limits (HTTP 429)
Frequently Asked Questions
Does this work on any Shopify store?
Yes — any public Shopify store that has the /products.json endpoint enabled (which is the default for all Shopify stores). Some stores may restrict this endpoint; the actor will skip those with a warning and continue.
Do I need a Shopify API key or login?
No. This actor uses Shopify's public product API, which requires no authentication. There is no login, no API key, and no store owner permissions needed.
How many stores can I scrape in one run?
As many as you want. Add all your target URLs to the storeUrls array and the actor will process them sequentially.
Will it get blocked?
The actor uses a configurable requestDelay (default 500ms) between page requests to behave like a polite crawler. For large-scale runs, enable Apify Proxy with RESIDENTIAL group for the best reliability.
How do I scrape only a specific product category?
Use the collection input field with the collection's URL handle. For example, if the collection URL is https://example.com/collections/new-arrivals, the handle is new-arrivals.
Can I schedule this to run automatically?
Yes. Use Apify's built-in scheduler to run this actor on any cron schedule — hourly, daily, weekly, or custom.
What export formats are available?
From the Apify dataset you can export: JSON, JSONL, CSV, Excel (XLSX), XML, and RSS — all without any additional configuration.
How fresh is the data?
The data is live — scraped directly from the store at the time the actor runs. There is no caching. Use Apify Scheduler for recurring freshness.
Does it handle stores with 10,000+ products?
Yes. Shopify's page-based pagination caps at 12,500 products (page=50 × limit=250). This actor automatically detects this limit and switches to since_id cursor-based pagination, enabling complete extraction of unlimited-size catalogs.
Supported Stores
This actor has been tested on 100+ popular Shopify stores including:
- Fashion & Apparel: Kith, Gymshark, Allbirds, Kylie Cosmetics, Pura Vida
- Footwear: Steve Madden, Vans, Steve Madden, many independent sneaker boutiques
- Beauty & Wellness: ColourPop, Morphe, OUAI, Herbivore Botanicals
- Home & Lifestyle: MVMT, Beardbrand, BarkBox
- Food & Beverage: Death Wish Coffee, Jones Road Beauty
If a store runs on Shopify, this actor can scrape it.
Legal & Compliance
This actor accesses only publicly available data through Shopify's documented public API endpoint (/products.json). It does not bypass authentication, access private data, or violate any platform security controls.
Users are responsible for ensuring their use of scraped data complies with:
- The target website's Terms of Service
- Applicable data protection laws (GDPR, CCPA, etc.)
- Apify's Terms of Service
This tool is intended for legitimate business use cases such as market research, competitive analysis, price monitoring, and product feed generation.
About
Built with ❤️ as an Apify Actor. Open source, pure Node.js, no browser required.
- Runtime: Node.js 20
- HTTP client:
got-scraping - Platform: Apify
- Source: Available on Apify platform
Questions or issues? Use the Apify community forum or open a support ticket.
Local Development
# Install dependenciesnpm install# Create input fileecho '{"storeUrls":["https://kith.com"],"maxProducts":10}' > storage/key_value_stores/default/INPUT.json# Run locallynpm start
Tech Stack
- Node.js 20 (ES modules)
- Apify SDK v3 — Actor lifecycle, dataset storage, proxy management
- got-scraping — HTTP requests optimized for scraping
- No browser / Puppeteer / Playwright — pure HTTP for speed and low cost
License
ISC