Pricing

from $4.00 / 1,000 results

Universal Store Catalog Scraper (Shopify/Woo/Generic)

Extract clean, normalized product-catalog data from any Shopify, WooCommerce, or generic e-commerce storefront. AI-agent / MCP ready: every store comes back in one identical schema.

Pricing

from $4.00 / 1,000 results

Rating

0.0

(0)

Developer

Maninder Pal Singh

Actor stats

Bookmarked

Total users

Monthly active users

24 days ago

Last modified

Universal Store Catalog Scraper 🛍️ → 🤖

Turn any Shopify, WooCommerce, or generic e-commerce storefront into a clean, normalized product feed — the same schema for every store, every time. Built MCP-first so an AI agent can consume it as a structured catalog tool.

This is a storefront-platform scraper, not a marketplace scraper. It targets independent merchant stores (each its own site) using public, non-personal product data only. The core value is cross-platform schema consistency: a Shopify store, a WooCommerce store, and a plain JSON-LD store all come back in one identical shape.

✨ Why this Actor

One schema, every platform. Agents don't have to special-case Shopify vs. Woo vs. generic — the output is always the same record.
Tiered, cheap-first extraction. Uses fast public JSON endpoints where they exist and only spins up a browser as a last resort, so runs stay fast and cheap.
Reliable & graceful. A malformed or blocked store degrades to fewer fields or a skip record — it never fails the whole run.
Polite by default. Respects robots.txt, rate-limits per domain, honest User-Agent, public data only.

🧱 How it extracts (tiered engine)

Platform	Primary path	Fallback
Shopify	Public `/products.json` (paginated, `?limit=250&page=N`)	Generic JSON-LD
WooCommerce	Store API `/wp-json/wc/store/v1/products`	Generic JSON-LD / HTML
Generic	schema.org `Product` JSON-LD	Open Graph → microdata → Playwright render

Detection is a single lightweight request (asset/header/global fingerprints, with a cheap endpoint probe when ambiguous) and is overridable via forcePlatform.

📥 Input

Two modes — use either or combine them.

Mode 1 — Direct URLs (default, safest)

{
  "storeUrls": ["https://www.allbirds.com", "https://some-woostore.com/shop"],
  "maxProductsPerStore": 200,
  "includeVariants": true,
  "includeImages": true
}

Mode 2 — Discovery (opt-in, capped)

{
  "discovery": {
    "enabled": true,
    "keywords": ["handmade ceramic mugs", "merino wool socks"],
    "platformHint": "any",
    "maxStores": 10
  },
  "maxProductsPerStore": 100
}

Field	Type	Default	Notes
`storeUrls`	string[]	`[]`	Store home / collection / product URLs
`discovery.enabled`	bool	`false`	Opt-in store discovery
`discovery.keywords`	string[]	`[]`	Niche/product terms
`discovery.platformHint`	enum	`any`	`shopify` \| `woocommerce` \| `any`
`discovery.maxStores`	int	`10`	Hard max 50
`maxProductsPerStore`	int	`200`	Hard max 5000 — cost ceiling
`includeVariants`	bool	`true`	Per-variant array
`includeImages`	bool	`true`	Image URLs (primary first)
`includeDescriptionHtml`	bool	`false`	Raw HTML descriptions (bulky)
`forcePlatform`	enum	`auto`	Override detection
`respectRobotsTxt`	bool	`true`	Honor robots.txt
`maxConcurrency`	int	`5`	Stores in parallel
`proxyConfiguration`	object	Apify auto	—

📤 Output — one normalized record per product

{
  "storeUrl": "https://www.allbirds.com",
  "storeName": "Allbirds",
  "platform": "shopify",
  "productId": "6592289505353",
  "productUrl": "https://www.allbirds.com/products/mens-wool-runners",
  "title": "Men's Wool Runners",
  "descriptionText": "Our iconic everyday sneaker made from merino wool...",
  "brand": "Allbirds",
  "productType": "Shoes",
  "categories": ["Shoes"],
  "price": 110.0,
  "compareAtPrice": null,
  "currency": "USD",
  "onSale": false,
  "availability": "in_stock",
  "sku": "WR-NAT-GRY-8",
  "variants": [
    { "variantId": "39472394814025", "title": "Natural Grey / 8", "price": 110.0,
      "compareAtPrice": null, "sku": "WR-NAT-GRY-8", "availability": "in_stock",
      "options": { "Color": "Natural Grey", "Size": "8" } }
  ],
  "images": ["https://cdn.shopify.com/.../wool-runner-1.jpg"],
  "tags": ["wool", "runners"],
  "ratingValue": null,
  "ratingCount": null,
  "scrapedAt": "2026-06-30T10:15:42.118Z",
  "extractionMethod": "shopify:products.json"
}

Field guarantees that matter for agents:

price / compareAtPrice are always numbers (never "$19.99"); currency is a separate ISO-4217 code.
availability is one of in_stock | out_of_stock | preorder | unknown.
platform is one of shopify | woocommerce | generic.
extractionMethod tells you exactly which path produced the record.
Stores that yield nothing produce a { storeUrl, skipped, error } row instead of disappearing.

See ./samples for a full input + output example per platform.

🤖 MCP / AI-agent usage

Apify exposes every Actor as an MCP tool, so an agent can call this scraper directly. Because the input schema is crisp and the output is deterministic and identical across platforms, the agent experience is clean.

Connect (Apify MCP server): https://mcp.apify.com — add YOUR_USERNAME/store-catalog-scraper as an available tool.

An agent calls it like this (conceptual tool call):

{
  "tool": "YOUR_USERNAME/store-catalog-scraper",
  "input": {
    "storeUrls": ["https://www.allbirds.com"],
    "maxProductsPerStore": 50,
    "includeVariants": true
  }
}

The agent then reads the run's dataset — every item is one normalized product it can reason over (compare prices across stores, check stock, build a catalog) without per-store parsing logic.

Tip for agent builders: keep maxProductsPerStore modest for interactive use, and rely on platform + extractionMethod to gauge data confidence.

💸 Pricing (pay-per-event)

Event	Rate (USD)	When
Result (product scraped)	$0.004 ($4 / 1,000)	Per normalized product written to the dataset — the main charge
Actor start	$0.02	Once per run

Platform compute + proxy are included (absorbed by the Actor), so you pay only the events above — pricing is predictable regardless of a store's size or speed.

Worked cost example — scrape 10 stores × 200 products = 2,000 products:

Actor start:        1 × $0.02  = $0.02
Products scraped: 2000 × $0.004 = $8.00
-------------------------------------------
Total                            ≈ $8.02   (~$0.004 per product)

You are only charged for real products: stores that fail or return nothing are written to a separate, non-billed dataset (skipped-stores).

🔒 Responsible use

Public, non-personal product data only. This Actor does not log into stores, collect merchant/customer personal data, or access gated content. If a store requires authentication to view products, it is skipped.
You own compliance. You are responsible for complying with each target store's Terms of Service and applicable law in your jurisdiction.
Discovery is best-effort and user-directed. Mode 2 finds candidate public storefronts from your keywords and verifies them before crawling; you remain responsible for what you choose to crawl. It is hard-capped at 50 stores.
Politeness is built in: robots.txt is respected by default, requests are rate-limited per domain with jitter, and an honest User-Agent is sent.

⚠️ Limitations

WooCommerce per-variant prices aren't exposed by the public Store API without extra calls, so Woo variants surface attribute combinations with price: null (product-level price is always populated).
Generic JSON-LD stores vary widely; fields present depend on what the store emits. Missing fields are null/[], never invented.
Shopify currency is read from the storefront; if a store hides it, currency may be null.
The Playwright fallback only triggers when cheap paths return nothing and the page looks JS-rendered; escalations are capped per run for cost.

🧩 Extending

Platform modules are cleanly separated (src/shopify.js, src/woocommerce.js, src/generic.js) and all funnel through src/normalize.js. Adding BigCommerce or Squarespace is a drop-in: add a detector fingerprint + an extractor that emits into the shared normalizer.

Ecommerce Scraper — Shopify, WooCommerce & Any Store

khadinakbar/ecommerce-store-scraper

MCP-ready ecommerce scraper. Extract products, prices, discounts, variants, stock & images from any Shopify, WooCommerce, or generic store $3.00/1K.

Khadin Akbar

Shopify Store Scraper

scrapapi/shopify-store-scraper

ScrapAPI

Shopify Scraper

scrapio/shopify-scraper

Shopify Scraper extracts product data from Shopify stores at scale. Collect titles, prices, images, variants, descriptions, availability, and more from any Shopify storefront. Ideal for e-commerce research, price monitoring, competitor analysis, and product catalog building.

Scrapio

Shopify Store Catalog & Price/Stock Extractor

fearless_sarinda-owner/shopify-catalog-price-extractor

Federico Casarella

Shopify Store Scraper

scrapio/shopify-store-scraper

Shopify Store Scraper extracts product and store data from Shopify websites. Collect product titles, prices, variants, images, collections, and availability for eCommerce research, monitoring, and catalog building at scale.

Scrapio

Shopify Product Scraper

xtracto/shopify-products

Pull the full product catalog from any Shopify store — title, price, variants, images, vendor, type and tags — one row per product. Works with any Shopify store or collection URL. No account or API key needed.

Farhan Febrian Nauval

Shopify Store Scraper | Metadata & Catalog Extractor

taroyamada/shopify-store-intelligence

Shopify store scraper that pulls public storefront metadata, product catalogs, collections, and vendor data directly from JSON endpoints. No browser, no auth. Returns structured tables ready for competitive catalog research.

naoki anzai

Shopify Store Intelligence — Product Catalog & Apps

oneary/shopify-store-and-product-intelligence-scraper

Scrape Shopify store data: theme detection, installed apps, product catalog, pricing, and traffic estimates. Ideal for e-commerce competitive research, store teardowns, and market analysis.

Luan M.

Shopify Store Scraper Goat

goat255/shopify-store-scraper

Scrape products from any Shopify storefront without a login or API key. Pull an entire store catalog, a single collection, or one product. Walks pagination up to your chosen limit and returns clean, normalized product data with prices, variants, images, and tags.

Goutam Soni

5.0

Shopify Store Catalog Scraper

itsyas/shopify-catalog-scraper

Extract the full product catalog of any Shopify store: products, variants (size/color with per-variant prices), sale prices, stock and images as clean JSON. Fast JSON-endpoint extraction — a 1,000-product store in minutes.