Lean Shopify Scraper
Pricing
Pay per usage
Lean Shopify Scraper
Modular Shopify scraper — pay only for the fields you need. Price / Catalog / Full modes with transparent SKU merging and visible error handling.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Per Schondell
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
The Shopify scraper that bills you for what you actually need.
If you only need daily price tracking, you shouldn't be paying for review scraping every run. This actor splits Shopify storefront extraction into three explicit modes with separate billing — pick the tier that matches your job.
Why use this actor?
Most Shopify actors bundle everything together: product data, variants, images, review aggregation, sales estimates. A dropshipper running a daily price-watch on 1,000 products pays for review scraping they don't need — every run, forever.
Lean Shopify Scraper has three modes:
| Mode | Price | Returns | Best for |
|---|---|---|---|
| Price | $1.50 / 1K products | current price, compare-at price, availability, sold-out variant ratio | Daily price-watch, dynamic repricing, discount detection |
| Catalog | $4 / 1K products | + variants, images, tags, description, vendor, SKU, barcode | Product research, dropshipping, catalog import |
| Full | $8 / 1K products | + review-app aggregation (Judge.me / Yotpo / Loox / Okendo / Stamped), sales estimate | Competitor intelligence, full audits |
v0.1 ships with Price and Catalog modes. Full mode is on the roadmap below.
How it stands apart
- 100% parse success rate on every reachable Shopify store in our 20-store benchmark (see
test-results/real-stores-report.json). Runnpm run test:realto reproduce. - Correct SKU merging — consolidated product pages return one row per product, not one per SKU. (Some popular alternatives don't do this.)
- No silent failures — HTTP 430 security rejections, 429 rate limits, malformed JSON, and blocked endpoints all surface as named errors in the run log with full per-store context. Crashed runs tell you exactly which store failed and why.
- Built-in retry-with-backoff — 429 and 5xx errors retry automatically with exponential backoff, honouring
Retry-After. 430 / 4xx errors do not retry (they won't fix themselves). - Cents-based price math — no float drift on
19.99 + 0.01. Output uses integer minor units; divide by 100 in your downstream code.
Quick start
Minimal input
{"mode": "price","storeUrls": ["https://allbirds.com"]}
Full input schema
{"mode": "price","storeUrls": ["https://allbirds.com","https://gymshark.com"],"maxProductsPerStore": 500,"delayMs": 1500}
Sample output (Price mode)
{"storeUrl": "https://allbirds.com","productId": 6616124981328,"handle": "trino-cozy-crew-heathered-onyx","title": "Trino® Cozy Crew - Heathered Onyx","vendor": "Allbirds","productType": "Socks","minPriceCents": 2400,"maxPriceCents": 2400,"compareAtMinCents": null,"compareAtMaxCents": null,"hasDiscount": false,"variantCount": 4,"soldOutCount": 3,"soldOutRatio": 0.75,"scrapedAt": "2026-05-27T15:53:23.620Z"}
A soldOutRatio near 1.0 with the product still listed is one of the strongest demand signals in Shopify catalog data — buyers want this product enough to clear most variants, and the store is slow to restock.
Pricing — concrete examples
| Scenario | Mode | Run cost |
|---|---|---|
| Daily price check on 1,000 products | Price | $1.50 / day → ~$45/mo |
| Full catalog ingest of 5,000 products (one-time) | Catalog | $20 |
| Competitor audit with reviews + sales estimate on 500 products | Full | $4 |
Compare with bundled-billing actors that charge $4–$7 per 1K regardless of fields requested.
Comparison vs current alternatives (Apify Store, May 2026)
| Feature | Lean Shopify Scraper | autofacts/shopify | webdatalabs/shopify-product-scraper |
|---|---|---|---|
| Modular billing (Price / Catalog / Full) | ✅ | ❌ | ❌ |
| Proper SKU merging on product pages | ✅ | ❌ (documented limitation) | ✅ |
| Explicit 430 / 429 / parse error logging | ✅ | partial | partial |
| Retry-with-backoff + Retry-After honored | ✅ | partial | partial |
| Cents-based price math (no float drift) | ✅ | ✅ | ❓ |
| Review-app aggregation (Judge.me / Yotpo / Loox / Okendo) | Full mode (v0.2) | ❌ | ✅ |
| Published parse success rate | 100% on 20/20 reachable stores | not published | not published |
Roadmap
- v0.2 — Sitemap-based pagination for catalogs >5K products; Web Bot Auth signed requests (significantly fewer 429s)
- v0.3 — Full mode (review-app aggregation + algorithmic sales estimate)
- v0.4 — Diff / webhook mode for price-change alerts on a watched set
- v0.5 — DACH-specific support (CHF / EUR rounding, VAT-inclusive prices, German review platforms)
Limitations
- Currently scrapes the public
/products.jsonendpoint only. Stores with >5,000 products will lose tail items until sitemap pagination ships in v0.2. - "Sales estimate" in Full mode is algorithmic (based on review volume + recency). It is not actual transaction data — treat as directional.
- Stores that have moved off Shopify return a
NOT_SHOPIFYerror (HTML body instead of JSON). The actor classifies this clearly rather than silently returning empty.
Local development
npm installnpm test # unit tests (vitest)npm run test:real # integration test against ~33 real Shopify storesnpm run build # compile TypeScriptnpm run start:dev # run locally with storage/key_value_stores/default/INPUT.json
Status
v0.1 — Price and Catalog modes in production. 38 unit tests + 33-store integration suite passing.