Shopify Products Scraper avatar

Shopify Products Scraper

Pricing

$19.99/month + usage

Go to Apify Store
Shopify Products Scraper

Shopify Products Scraper

🛍️ Shopify Products Scraper extracts titles, prices, variants, images, SKUs, tags & descriptions from any Shopify store. 📦 Supports collections & inventory. 📊 Export CSV/JSON for catalog builds, competitor analysis & SEO. ⚡ Fast, reliable, no coding required.

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

Scraply

Scraply

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

Shopify Products Scraper

Shopify Products Scraper is an Apify actor that discovers product pages on Shopify stores and fetches each product’s .json endpoint to extract structured data like titles, vendors, product types, prices, and variants. It solves the manual effort of cataloging products by acting as a shopify product scraper and shopify product data extractor that automatically finds product URLs and preserves the complete Shopify product JSON for analysis and exports. Built for marketers, developers, data analysts, and researchers, this shopify store scraper scales across multiple stores with robust proxy fallback to keep your pipelines flowing.

What data / output can you get?

Below are the exact fields pushed to the Apify dataset during a run. Each record represents a single product and includes both summary fields and the full Shopify product JSON.

Data fieldDescriptionExample value
store_urlThe base Shopify store URL processed for this producthttps://lootcrate.com
product_urlThe public product page URLhttps://lootcrate.com/products/loot-crate
json_urlThe .json endpoint used for extractionhttps://lootcrate.com/products/loot-crate.json
product_idShopify product ID from the JSON5083963261059
titleProduct title from the JSONLoot Crate
vendorVendor/brand from the JSONLoot Crate Core
product_typeProduct type from the JSONSubscription Box
pricePrice from the first variant (if available)29.99
compare_at_priceCompare-at price from the first variant (if available)24.99
tagsComma-separated product tags from the JSONSubscription, Collectibles, Pop Culture
total_foundCount of product URLs discovered for the store in this run5
successfulNumber of products successfully extracted for the store at this point5
full_dataFull Shopify product JSON object returned by the .json endpoint{ "product": { ... } }

Notes:

  • The scraper preserves the entire Shopify product response in full_data, which typically includes variants, images, timestamps, and more — ideal for shopify variant scraper, shopify product images scraper, and shopify price scraper use cases.
  • You can export results from the dataset as CSV, JSON, or Excel to power shopify product csv export workflows.

Key features

  • 🔍 Automatic product discovery
    • Scans store HTML for links containing “/products/” and normalizes absolute/relative URLs. If the site is detected as Shopify, it paginates “/products.json” to collect handles reliably.
  • 🧩 Complete JSON preservation
    • Fetches each product’s .json endpoint and stores the entire response in full_data for maximum flexibility (variants, images, tags, vendor, product_type, and more).
  • 🔁 Intelligent proxy fallback
    • Resilient pipeline with direct → datacenter → residential proxy escalation, including retries and sticky residential usage once activated. Designed to keep your shopify product listing scraper running through blocks (403/429).
  • ⚡ Concurrent requests at scale
    • Asynchronous fetching and controlled concurrency to scrape shopify products across multiple stores quickly and reliably.
  • 💾 Live dataset saving
    • Pushes each product to the dataset in real time, so partial results are saved even if a run is interrupted.
  • 🛡️ Robust error handling
    • Network retries, structured proxy switching, and clear logging for monitoring and debugging.
  • 🧪 Store-aware strategy
    • Detects Shopify signatures to use “/products.json” pagination where possible; otherwise falls back to HTML link discovery for product URLs.
  • 📈 Practical defaults
    • Internally paginates “/products.json?limit=250&page=N” and uses an internal cap to limit total products processed per store in a run.

How to use Shopify Products Scraper - step by step

  1. Sign in to Apify Console and go to Actors.
  2. Find “shopify-products-scraper” by username “scraply” and open the actor.
  3. Add input data:
    • In startUrls, provide one or more Shopify store homepages (e.g., https://lootcrate.com).
    • Optionally set proxyConfiguration if you want to start with Apify Proxy; otherwise the actor starts with no proxy and falls back automatically if blocked.
  4. Start the run. The actor will detect Shopify stores and either paginate products.json or extract product links from HTML.
  5. Monitor logs. You’ll see:
    • Discovery status (found product URLs)
    • Proxy switches (direct → datacenter → residential)
    • Per-product success/failure messages
  6. Review results:
    • Dataset: Each product is saved with store_url, product_url, json_url, and more. Export as CSV, JSON, or Excel.
    • Key-Value Store: A grouped “OUTPUT” object summarizing each store with total_found, successful, method, and products (url + json) is stored for programmatic consumption.
  7. Export and integrate:
    • Use Apify’s dataset exports for BI dashboards or pipelines.
    • Automate downloads via the Apify API for scheduled reports or feeds.

Pro Tip: Add multiple store URLs to startUrls to run bulk extraction. The proxy fallback will become sticky at the residential level across product requests if needed, improving throughput and stability for large catalogs.

Use cases

Use case nameDescription
Competitor pricing monitoringTrack current and compare-at pricing across stores to power a shopify price scraper for discount analysis and pricing intelligence.
Catalog building & enrichmentBuild product feeds from full_data and export to CSV/JSON for ingestion in PIM/ERP — an end-to-end shopify product feed scraper.
Variant & inventory analysisAnalyze SKUs, variant titles, and inventory-related fields contained in the Shopify product JSON — ideal for a shopify inventory scraper workflow.
SEO & content auditsUse titles, tags, and metadata to study keyword strategy and product taxonomy for SEO research.
Image pipelinesCollect product images from the preserved JSON for creative workflows — a practical shopify product images scraper use case.
Market researchCompare vendors, product types, and tags across multiple Shopify stores for trend and assortment analysis.
API data pipelinesFeed the dataset output into internal APIs or data lakes via the Apify API for automation and reporting.

Why choose Shopify Products Scraper?

Built for precision, scale, and reliability, this shopify product scraper uses Shopify-native endpoints when available and falls back to HTML discovery seamlessly.

  • 🎯 Accurate by design: Extracts directly from product .json endpoints and preserves the full response for flexible downstream use.
  • ⚡ Scale-ready: Concurrent fetching and per-store batching make it fast to scrape shopify products across many domains.
  • 🔐 Resilient networking: Automatic proxy fallback (direct → datacenter → residential) with retries and sticky residential behavior under blocks.
  • 💾 Real-time saving: Writes each product to the dataset as it’s processed to prevent data loss and enable early exports.
  • 🧰 Developer-friendly: Outputs clean JSON with stable field names and stores a grouped “OUTPUT” object in the key-value store.
  • 🧭 Better than extensions: No brittle browser automation — this production-grade shopify product data extractor runs server-side with clear logs.
  • 💸 Export-friendly: Download datasets as CSV, JSON, or Excel for BI, catalog syncs, and feeds without extra tooling.

Yes — when used responsibly. This actor collects data from publicly available Shopify pages and does not access private accounts or authenticated areas.

Guidelines:

  • Scrape only public product pages and metadata.
  • Review and respect target websites’ terms of service and robots.txt.
  • Ensure compliance with applicable laws (e.g., GDPR, CCPA).
  • Avoid misuse; do not employ the tool for unlawful or unethical activities.
  • Consult your legal team for edge cases or jurisdiction-specific requirements.

Input parameters & output format

Example JSON input

{
"startUrls": [
"https://lootcrate.com",
"https://www.decathlon.com"
],
"proxyConfiguration": {
"useApifyProxy": false
}
}

Input fields

  • startUrls (array, required)
  • proxyConfiguration (object, optional)
    • Description: Choose which proxies to use. By default, no proxy is used. If the platform rejects or blocks the request, it will automatically fallback to datacenter proxy, then residential proxy with 3 retries.
    • Default: { "useApifyProxy": false } (prefill)

Example JSON output (single dataset item)

{
"store_url": "https://lootcrate.com",
"product_url": "https://lootcrate.com/products/loot-crate",
"json_url": "https://lootcrate.com/products/loot-crate.json",
"product_id": 5083963261059,
"title": "Loot Crate",
"vendor": "Loot Crate Core",
"product_type": "Subscription Box",
"price": "29.99",
"compare_at_price": "24.99",
"tags": "Subscription, Collectibles, Pop Culture",
"total_found": 5,
"successful": 5,
"full_data": {
"product": {
"id": 5083963261059,
"title": "Loot Crate",
"vendor": "Loot Crate Core",
"product_type": "Subscription Box",
"handle": "loot-crate",
"tags": "Subscription, Collectibles, Pop Culture",
"variants": [
{
"id": 34197535719555,
"title": "S / XS",
"price": "29.99",
"compare_at_price": "24.99",
"sku": "1010126US",
"inventory_management": "shopify",
"requires_shipping": true
}
],
"images": [
{
"id": 123456789,
"src": "https://cdn.shopify.com/..."
}
]
}
}
}

Notes:

  • compare_at_price may be null if not set on the first variant.
  • tags may be an empty string if the store doesn’t use tags.
  • full_data contains the raw Shopify product JSON, which can include additional fields like body_html, created_at, updated_at, published_at, and more depending on the store.

FAQ

Do I need a login or cookies to use this shopify product scraper?

No. The actor fetches public pages and product .json endpoints without login or cookies. It operates as a server-side shopify product scraper tool with robust networking and retries.

Can I scrape multiple Shopify stores in one run?

Yes. Add several stores to the startUrls array. The actor processes each store, discovers or enumerates products, and saves results per product to the dataset, suitable to scrape shopify products at scale.

What product data is included in the output?

Each dataset item contains store_url, product_url, json_url, product_id, title, vendor, product_type, price, compare_at_price, tags, total_found, successful, and full_data. The full_data field preserves the complete Shopify product JSON (including variants and images), making it a reliable shopify variant scraper and shopify product images scraper.

How does proxy fallback work if a store blocks requests?

The actor starts with no proxy. On 403/429 blocks, it falls back to a datacenter proxy, and if blocking persists, it escalates to a residential proxy with retries. Once residential is enabled, it remains sticky for remaining requests to maintain reliability.

Can I export results to CSV for a shopify product csv export workflow?

Yes. Open the run’s Dataset and export to CSV, JSON, or Excel. This makes it easy to build a shopify product feed scraper pipeline for catalogs, analytics, or SEO.

Is there an API to access results programmatically?

Yes. You can access the run’s Dataset and Key-Value Store (including the grouped “OUTPUT” summary) via the Apify API to integrate with internal systems or automation workflows.

Does it capture variant pricing and images?

Yes. Variant and image data are included within full_data from the Shopify product JSON, enabling shopify price scraper and asset-processing workflows.

Are there any limits during a run?

The actor paginates “/products.json” where available and applies internal limits to the total number of products processed per store in a run. You can monitor progress and totals in the logs and in the total_found and successful fields.

Closing thoughts

Shopify Products Scraper is built for automated, reliable product data extraction from Shopify stores. It discovers product URLs, fetches each product’s .json, and preserves the complete response for downstream use.

Whether you’re a marketer, developer, data analyst, or researcher, you can export clean CSV/JSON feeds, track pricing and variants, or power analytics with the full_data payload. Developers can integrate via the Apify API and automate pipelines end-to-end. Start extracting smarter product insights at scale with a resilient shopify product information scraper that’s production-ready.