Pricing

Pay per event

Shopify Store Scraper | Metadata & Catalog Extractor

Shopify store scraper that pulls public storefront metadata, product catalogs, collections, and vendor data directly from JSON endpoints. No browser, no auth. Returns structured tables ready for competitive catalog research.

Pricing

Pay per event

Rating

0.0

(0)

Developer

太郎山田

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

Shopify Store Leads & Catalog Intelligence

After this run

Turn this Actor's output into a capped paid report with Ad Landing Page Offer Intelligence & CRO Gap Report. Use it when paid media, CRO, and agency teams need to decide which public landing-page offer gaps to fix before increasing ad spend.

First report: $3 / landing_offer_report; set maxChargeUsd to $3.
Deeper report: $15 / cro_gap_report_pack; use only when the first result needs competitor or action-depth.
This is an internal Apify flow aid. It is not revenue proof until accounted paid usage appears.

Next report-style Actors

If you already have data from this Actor, these follow-on Actors turn public or user-provided inputs into decision-ready reports. They are optional, capped by maxChargeUsd, and do not make business outcome claims.

ATS Hiring Signal Report - turn target-company public hiring pages into expansion and account-priority signals.
SaaS Pricing Page Monitor - monitor competitor public pricing pages after store intelligence.
Ad Landing Page Offer Intelligence - audit public landing pages for offer, proof, CTA, and friction.
CSV Local Business List Scoring - score exported business lists before SEO cleanup.

Runtime: Node.js 20+.

Extract analyst-ready Shopify storefront intelligence from public merchant endpoints: normalized domain, store identity, currency, price range, sampled products, collections, merch rollups, endpoint warnings, and explicit pay-per-event billing fields.

This actor is built for ecommerce analysts, growth teams, marketplace operators, technical SEO teams, data engineers, and competitive intelligence workflows that need repeatable storefront facts without running a browser. It reads public Shopify surfaces such as the homepage, /meta.json, /products.json, /collections.json, and optional /pages.json / /blogs.json. It is not a search engine or discovery actor: provide the store URLs you want inspected.

Store Quickstart

Start with dataset delivery so analysts can inspect rows before wiring automation:

Quickstart Baseline (2 Stores -> Catalog + Merch Signals): two public storefronts, low sampling limits, and the core analyst fields: status, chargedEvent, isShopify, normalizedDomain, storeName, currency, priceRange, productCount, productsSample, signals, and errors.
Recurring Baseline (Multi-Store Catalog Watch): schedule the same watchlist weekly or daily to compare catalog size, price range, vendor/tag rollups, endpoint availability, and warning counts over time.
Webhook Routed Check (Daily Store Updates): use only after dataset rows match your downstream BI, CRM, Slack, Make, n8n, or warehouse schema.
Content Expansion (Pages / Blogs When Public): enable when page/blog metadata matters for SEO, launch monitoring, policy copy checks, or content inventory.

The included store-input.example.json is the lowest-friction Store proof. sample-output.example.json shows the published result contract, including one charged Shopify row and one no-charge blocked row.

Analyst Workflow

Provide known Shopify or ecommerce storefront URLs. The actor does not discover stores from search terms.
Run in delivery: "dataset" with modest sampling limits.
Review status, chargedEvent, signals, warnings, and errors before routing rows downstream.
Filter charged Shopify rows with chargedEvent equal to store_enriched or store_partial for analyst review.
Keep no-charge rows such as invalid_input, blocked, and not_store as watchlist cleanup tasks.
Add webhook delivery only after analysts trust the dataset shape.

Key Features

Multi-store inspection for up to 50 storefront URLs per run.
Public Shopify signal detection from homepage, meta, products, and collections endpoints.
Analyst summary fields for domain, store name, currency, price range, product count, sampled products, status, billing event, signals, and errors.
Catalog and collection samples from public Shopify JSON endpoints.
Vendor, tag, and product-type rollups derived from sampled products.
Restriction-aware output for blocked, non-JSON, unavailable, timeout, invalid, and non-store cases.
Optional pages and blogs metadata when public endpoints expose it.
Dataset-first and webhook-after delivery modes.

Use Cases

Who	Workflow	Value
Ecommerce analysts	Track competitor catalog, price bands, and merchandising structure	Repeatable store summary rows for comparison over time
Data teams	Pipe normalized storefront fields into warehouses or dashboards	Stable row keys such as `normalizedDomain`, `storeName`, and `currency`
Technical SEO teams	Inspect public collections, products, pages, and blog metadata	Fast endpoint-level visibility without browser automation
Marketplace operators	Validate merchant storefronts and detect public Shopify evidence	Clear `isShopify`, `status`, and no-charge cleanup rows
RevOps / growth teams	Feed merchant intelligence into CRM or account scoring	Sampled products, signals, and errors ready for routing

Input

Field	Type	Default	Description
`storeUrls`	string[]	required	Known storefront URLs to inspect. Custom domains and `*.myshopify.com` domains both work. Maximum 50 stores per run.
`productSampleLimit`	integer	`25`	Maximum public products to fetch per store from `/products.json`. Keep low for quickstarts and recurring monitoring.
`collectionSampleLimit`	integer	`25`	Maximum public collections to fetch per store from `/collections.json`.
`includeContentMetadata`	boolean	`false`	When true, also attempts `/pages.json` and `/blogs.json`. Leave false until content metadata is worth the extra sampling.
`contentSampleLimit`	integer	`10`	Maximum public pages or blogs to sample when content metadata is enabled.
`timeoutMs`	integer	`15000`	Per-request timeout in milliseconds.
`delivery`	string	`"dataset"`	`dataset` writes durable rows for review. `webhook` writes dataset rows first, then sends the full payload to `webhookUrl`.
`webhookUrl`	string	empty	Required only when `delivery` is `"webhook"`.
`dryRun`	boolean	`false`	Development mode. Skips dataset writes and webhook delivery, but still writes local `output/result.json`.

Input Example

{
  "storeUrls": [
    "https://colourpop.com",
    "https://allbirds.com"
  ],
  "productSampleLimit": 10,
  "collectionSampleLimit": 6,
  "includeContentMetadata": false,
  "contentSampleLimit": 5,
  "timeoutMs": 15000,
  "delivery": "dataset",
  "dryRun": false
}

Input Examples

Example: Single store catalog snapshot

{
  "stores": [
    "allbirds.com"
  ],
  "includeCollections": true,
  "maxProductsPerStore": 250
}

Example: Competitor catalog comparison

{
  "stores": [
    "brand1.myshopify.com",
    "brand2.myshopify.com"
  ],
  "includeCollections": true,
  "includeVendorRollup": true
}

Example: Vendor / tag rollup audit

{
  "stores": [
    "multi-brand-store.com"
  ],
  "includeVendorRollup": true,
  "includeTagRollup": true,
  "maxProductsPerStore": 500
}

Output

The Apify dataset receives one row per input storefront after normalization and deduplication. Local output/result.json wraps the same rows in { "meta": ..., "results": [...] }.

Field	Type	Analyst meaning
`status`	string	Result classification: `success`, `partial`, `not_shopify`, `blocked`, `invalid_input`, `not_store`, `timeout`, or `error`. Use this before routing rows to analysts.
`chargedEvent`	string	null
`isShopify`	boolean	True when Shopify evidence was detected from metadata, Shopify endpoints, theme scripts, or public Shopify JSON.
`normalizedDomain`	string	null
`storeName`	string	null
`currency`	string	null
`priceRange`	object	Minimum and maximum prices observed in sampled public products. Null values mean no public product prices were sampled.
`productCount`	integer	Number of public products sampled in this row. This is sample count, not full catalog size.
`productsSample`	object[]	CSV/API-friendly alias for sampled public product records. Same source as `productSamples`.
`signals`	string[]	Evidence used for classification and charging, such as `shopify_detected`, `shopify_products_json`, `shopify_collections_json`, or `ecommerce_cart_or_checkout`.
`errors`	object[]	Structured endpoint or run problems important for automation. Includes type, endpoint, HTTP status, and message.
`inputUrl`	string	Original URL provided by the user.
`normalizedUrl`	string	null
`hostname`	string	Hostname from `normalizedUrl`.
`store`	object	Store profile fields such as name, canonical URL, myshopify domain, theme, locale, country, and Shopify detection.
`summary`	object	Counts, sample basis, and endpoint status map for homepage, meta, products, collections, pages, and blogs.
`collections`	object[]	Sampled public collections.
`productSamples`	object[]	Sampled public products with vendor, type, tags, variant count, availability, images, and product-level price range.
`rollups`	object	Vendor, tag, and product-type counts derived from sampled products only.
`content`	object	Optional page and blog samples when `includeContentMetadata` is enabled and endpoints are public.
`warnings`	object[]	Endpoint restrictions, non-JSON responses, unavailable endpoints, and sample truncation notices.
`error`	string	null

Output Example

{
  "inputUrl": "https://example-shop.com",
  "normalizedUrl": "https://example-shop.com",
  "hostname": "example-shop.com",
  "status": "success",
  "chargedEvent": "store_enriched",
  "isShopify": true,
  "normalizedDomain": "example-shop.com",
  "storeName": "Example Shop",
  "currency": "USD",
  "priceRange": { "min": 12.5, "max": 49.99 },
  "productCount": 2,
  "productsSample": [
    {
      "title": "Sample Product",
      "url": "https://example-shop.com/products/sample-product",
      "vendor": "Example Shop",
      "productType": "Accessory",
      "priceRange": { "min": 12.5, "max": 19.99 }
    }
  ],
  "signals": ["shopify_detected", "shopify_products_json", "shopify_collections_json"],
  "errors": [],
  "store": {
    "name": "Example Shop",
    "currency": "USD",
    "myshopifyDomain": "example-shop.myshopify.com",
    "canonicalUrl": "https://example-shop.com/",
    "themeName": "Dawn",
    "shopifyDetected": true
  },
  "summary": {
    "productSampleCount": 2,
    "collectionSampleCount": 1,
    "vendorCount": 1,
    "tagCount": 3,
    "endpointStatuses": {
      "homepage": "ok",
      "meta": "ok",
      "products": "ok",
      "collections": "ok",
      "pages": "skipped",
      "blogs": "skipped"
    }
  },
  "warnings": [],
  "error": null
}

No-charge diagnostic rows keep analyst queues honest:

{
  "inputUrl": "not a url",
  "normalizedUrl": null,
  "hostname": "",
  "status": "invalid_input",
  "chargedEvent": null,
  "isShopify": false,
  "normalizedDomain": null,
  "storeName": null,
  "currency": null,
  "priceRange": { "min": null, "max": null },
  "productCount": 0,
  "productsSample": [],
  "signals": [],
  "errors": [
    {
      "type": "invalid_input",
      "endpoint": null,
      "status": null,
      "message": "Unsupported protocol for store URL."
    }
  ]
}

PPE Events And No-Charge Rules

This actor uses explicit pay-per-event row charging. Production runtime passes the event name from chargedEvent when a row should be charged.

Result status	PPE event	Charged?	Meaning
`success`	`store_enriched`	Yes	Shopify evidence plus useful public catalog or store metadata were captured, and primary catalog endpoints were available.
`partial`	`store_partial`	Yes	Shopify evidence was found, but one or more important endpoints were restricted, unavailable, or incomplete.
`not_shopify` with ecommerce evidence	`non_shopify_store_detected`	Yes	The site looks like an ecommerce store but does not expose Shopify evidence. Useful for merchant classification.
`invalid_input`	`null`	No	The input could not be normalized into an HTTP(S) storefront URL.
`blocked`	`null`	No	Public endpoints were blocked, restricted, password-like, or non-JSON across primary surfaces.
`not_store`	`null`	No	The homepage loaded but no Shopify or ecommerce storefront evidence was found.
`timeout`	`null`	No	Endpoint requests timed out.
`error`	`null`	No	Unexpected run or fetch failure.

Use chargedEvent rather than status alone for billing audits. invalid_input, blocked, not_store, timeout, and error rows are no-charge diagnostics and should be retained for watchlist cleanup.

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console -> Settings -> Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~shopify-store-intelligence/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "storeUrls": ["https://colourpop.com", "https://allbirds.com"], "productSampleLimit": 10, "collectionSampleLimit": 6, "includeContentMetadata": false, "contentSampleLimit": 5, "timeoutMs": 15000, "delivery": "dataset", "dryRun": false }'

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/shopify-store-intelligence").call(run_input={
    "storeUrls": ["https://colourpop.com", "https://allbirds.com"],
    "productSampleLimit": 10,
    "collectionSampleLimit": 6,
    "includeContentMetadata": False,
    "contentSampleLimit": 5,
    "timeoutMs": 15000,
    "delivery": "dataset",
    "dryRun": False,
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["normalizedDomain"], item["status"], item["chargedEvent"], item["productCount"])

JavaScript / Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/shopify-store-intelligence').call({
  storeUrls: ['https://colourpop.com', 'https://allbirds.com'],
  productSampleLimit: 10,
  collectionSampleLimit: 6,
  includeContentMetadata: false,
  contentSampleLimit: 5,
  timeoutMs: 15000,
  delivery: 'dataset',
  dryRun: false,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items.map((row) => ({
  normalizedDomain: row.normalizedDomain,
  status: row.status,
  chargedEvent: row.chargedEvent,
  productCount: row.productCount,
})));

Tips And Limitations

This is not a search or discovery actor; provide known storefront URLs.
productCount, priceRange, rollups, and productsSample are based on sampled public products, not the full catalog.
Some Shopify stores restrict public JSON endpoints; these become partial, blocked, or no-charge diagnostic rows depending on available evidence.
not_shopify can be charged only when useful ecommerce evidence exists, because it helps classify non-Shopify merchant URLs.
Use delivery: "dataset" first. Move to webhooks only after downstream tools accept the row shape.
Use dryRun: true for local development or shape checks where dataset writes and webhooks should be skipped.

FAQ

Does this fetch every product and collection?

No. The actor samples public /products.json and /collections.json results up to your configured limits.

What happens on restricted stores?

The actor emits explicit warnings and errors, and uses partial, blocked, or another diagnostic status depending on whether useful Shopify evidence was still available.

Can non-Shopify stores be useful?

Yes. If the homepage contains ecommerce evidence such as cart, checkout, product structured data, or platform hints, the row is classified as not_shopify and charged as non_shopify_store_detected.

Can I route results to another tool?

Yes. Keep dataset mode for inspection, then use webhook mode for Slack, Make, n8n, BI ingestion, CRM enrichment, or internal monitoring.

Website Content Extractor for cleaned text from policy, FAQ, pricing, help-center, or landing pages.
Contact Details Extractor for public support, sales, or partnership contacts from the same merchant domain.
Domain Security Audit API for SSL, DMARC, expiry, and security-header checks.
AI Visibility Monitor for brand visibility checks beside storefront monitoring.

Was this helpful?

If this actor saved you time, please leave a rating on Apify Store. Bug reports and feature requests belong on the actor Issues tab.

Premium Report Pack

Use these premium report actors when a raw dataset is ready to become a buyer-facing audit, watch summary, or agency deliverable. All three keep sourceDatasetId as advanced-only; first runs should use pasted input, URLs, demo mode, and reportTier.

CSV Local Business List Scoring & SEO Gap Report - Score pasted local business CSV lists and produce agency-ready lead/SEO gap reports.
SaaS Pricing Page Monitor & Competitor Price Change Alerts - Turn public pricing pages into snapshots, competitor reports, and weekly pricing watch summaries.
Ad Landing Page Offer Intelligence & CRO Gap Report - Analyze user-provided landing pages and pasted ad copy for offer, CTA, proof, and CRO gaps.

Recommended flow from this actor: run the current extraction/check first, export the useful dataset or copy the relevant URLs, then choose entry, premium, or bundle in the report actor with maxChargeUsd as the safety cap.

Use these follow-on Actors when you want a capped, decision-ready report instead of more raw rows. They use public or user-provided inputs, respect maxChargeUsd, and do not promise rankings, revenue, conversion lifts, or sales outcomes.

SaaS Pricing Page Monitor - watch public competitor pricing and packaging pages after store research.
Ad Landing Page Offer Intelligence - audit public landing pages for offer, proof, CTA, and friction.
CSV Local Business List Scoring - score exported or user-provided lead lists before cleanup or outreach.

If this Actor gave you raw rows or source context, these follow-on report Actors are designed for a small capped paid run. They help make a decision, not just collect more data.

SaaS Pricing Page Monitor & Competitor Price Change Alerts - decide whether a public competitor pricing page changed in a way that affects packaging or sales messaging. Entry $3 / pricing_snapshot_report; premium $15 / competitor_pricing_report.
Ad Landing Page Offer Intelligence & CRO Gap Report - decide which public landing-page offer gaps to fix before increasing ad spend. Entry $3 / landing_offer_report; premium $15 / cro_gap_report_pack.
CSV Local Business List Scoring & SEO Gap Report - prioritize which businesses in a list deserve outreach, cleanup, or SEO follow-up. Entry $3 / lead_scoring_report; premium $15 / agency_lead_gap_report.

Keep maxChargeUsd equal to the selected tier. Internal links are traffic aids only; real proof requires accounted paid usage.

Shopify Store Catalog Extractor

simplifysme/shopify-store-catalog-extractor

🏪 Extract complete store catalogs from Shopify stores including all products, collections, and metadata. Perfect for competitor analysis and market research.

SimplifySME Toolbox

Shopify Store Scraper

scraply/shopify-store-scraper

🛍️ Shopify Store Scraper extracts products, prices, variants, collections, themes, installed apps, images & SEO metadata from any public Shopify store. ⚡ Perfect for market research, competitor analysis, lead gen & catalog building. 📊 Clean, export-ready data (CSV/JSON).

Scraply

Shopify Store Scraper

scrapio/shopify-store-scraper

Shopify Store Scraper extracts product and store data from Shopify websites. Collect product titles, prices, variants, images, collections, and availability for eCommerce research, monitoring, and catalog building at scale.

Scrapio

Shopify Store Scraper

scraperx/shopify-store-scraper

🛍️ Shopify Store Scraper extracts products, prices, variants, inventory, images, collections & SEO data from public Shopify stores. ⚡ Fast, scalable, API-ready. 📊 CSV/JSON export. 🚀 Ideal for competitor analysis, price tracking, and catalog enrichment.

ScraperX

Shopify Store Scraper

cloud9_ai/shopify-scraper

Extract product catalogs from any Shopify store: title, price, variants, images, inventory status, collections, vendor, tags. Uses Shopify public JSON API for 100% reliability. Multi-store support. Perfect for e-commerce competitor analysis, price monitoring, dropshipping research.

cloud9

Shopify Store Scraper

scrapemesh/shopify-store-scraper

🛍️ Shopify Store Scraper extracts product data from any Shopify store — titles, prices, variants, SKUs, images, descriptions, inventory & collections. 📊 Ideal for competitor research, price tracking, SEO, and catalog builds. 🚀 Fast, scalable, CSV/JSON exports.

ScrapeMesh

Shopify Store Scraper

oneary/shopify-store-scraper

Luan

Shopify Store Scraper

scraperforge/shopify-store-scraper

ScraperForge

Shopify Store Scraper

scrapium/shopify-store-scraper

Scrapium

Shopify Store Scraper

scrapeengine/shopify-store-scraper

ScrapeEngine

Shopify Store Scraper | Metadata & Catalog Extractor

Shopify Store Leads & Catalog Intelligence

After this run

Next report-style Actors

Store Quickstart

Analyst Workflow

Key Features

Use Cases

Input

Input Example

Input Examples

Example: Single store catalog snapshot

Example: Competitor catalog comparison

Example: Vendor / tag rollup audit

Output

Output Example

PPE Events And No-Charge Rules

API Usage

cURL

Python

JavaScript / Node.js

Tips And Limitations

FAQ

Related Actors

Was this helpful?

Premium Report Pack

Related report Actors

Related paid report workflows

You might also like

Shopify Store Catalog Extractor

Shopify Store Scraper

Shopify Store Scraper

Shopify Store Scraper

Shopify Store Scraper

Shopify Store Scraper

Shopify Store Scraper

Shopify Store Scraper

Shopify Store Scraper

Shopify Store Scraper