Shopify Store Leads Scraper
Pricing
Pay per usage
Shopify Store Leads Scraper
Turn a domain list into qualified Shopify B2B leads. Detects Shopify stores via /products.json, response headers, and HTML markers — enriches with product count, contact email, social links, and currency. Works on stores that block the products endpoint.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
9 days ago
Last modified
Categories
Share
🎯 What this scrapes
Shopify powers over 4 million stores worldwide. This Actor takes a list of website domains and runs multi-signal detection across each one: it tries /products.json, reads HTTP headers (x-shopify-stage, x-sorting-hat-shopid, powered-by), and scans the homepage HTML for cdn.shopify.com, Shopify.theme, and *.myshopify.com references. Whichever signal fires first wins. Confirmed Shopify stores are then enriched with product count, a sample of product titles, the *.myshopify.com subdomain, any contact email found on the homepage or /contact page, social profile links, and the storefront currency.
Run it once to qualify a prospect list, or schedule it weekly to monitor an entire vertical.
🔥 What we handle for you
- 🛡️ Multi-signal detection — three independent detection layers so stores that block
/products.json(like Gymshark) are still correctly identified via headers and HTML markers. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every request so your detection run doesn't get flagged as a crawler.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per domain,Retry-Afterhonoured. - 🧱 Rate-limit-aware pacing — when a CDN or WAF pushes back, we slow down and rotate before retrying, not after failing.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, JSON / CSV / Excel export from the Apify Console.
- 💰 Pay-Per-Event pricing — you pay only for confirmed results that hit your dataset. Zero rows, zero charge (beyond the small start event).
💡 Use cases
- Shopify app vendor prospecting — feed a domain list from a niche directory, get back every Shopify store with email and product count, ready to import into your CRM.
- DTC agency outreach — qualify inbound leads: "does this prospect actually run Shopify?" answered programmatically, no manual checks.
- Competitive intelligence — track which new entrants in a product category have spun up Shopify stores over the last 30 days.
- Platform migration research — identify the Shopify footprint in a specific industry vertical before pitching a replatforming project.
- Market-sizing for app developers — count confirmed Shopify stores in a niche to validate an app idea before building.
⚙️ How to use it
- Click Try for free at the top of the page.
- Paste your domain list into the Domains to check field — one domain per line, bare domains or full URLs both work.
- Adjust Max products to sample, Only Shopify stores, and the proxy settings as needed.
- Click Start. Rows stream into the dataset as each domain is resolved.
- Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the Apify API.
For no-code workflows: install the Apify node for n8n or the Apify module for Make, connect your token, and point it at DevilScrapes/shopify-store-leads-scraper.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
domains | array | yes | ["allbirds.com", "gymshark.com"] | Domains or URLs to probe. Scheme and path are stripped automatically. |
includeProducts | boolean | no | true | Fetch product count + sample titles from /products.json or sitemap. |
maxProductsSample | integer | no | 10 | Max product titles per domain (1–250). |
onlyShopify | boolean | no | true | Drop non-Shopify domains from the dataset. Set false for a full audit. |
maxResults | integer | no | 50 | Hard cap on rows emitted. 0 = unlimited. |
proxyConfiguration | object | no | Apify residential proxy | Proxy routing. Residential recommended to avoid datacenter blocks. |
Example input
{"domains": ["allbirds.com", "gymshark.com", "kylie.com"],"includeProducts": true,"maxProductsSample": 10,"onlyShopify": true,"maxResults": 50,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
📤 Output
One dataset row per probed domain (or per confirmed Shopify store when onlyShopify=true).
| Field | Type | Notes |
|---|---|---|
domain | string | Normalised domain that was probed. |
is_shopify | boolean | true when any detection signal confirmed Shopify. |
detection_method | string | null | Which signal fired: products_json, headers, html_markers, or none. |
myshopify_domain | string | null | *.myshopify.com subdomain when discoverable. |
product_count | integer | null | Total products reported by /products.json. Null when endpoint blocked. |
sample_product_titles | array[string] | Up to maxProductsSample product titles. |
currency | string | null | ISO-4217 currency detected on the storefront (e.g. USD, GBP). |
email | string | null | Contact email found via mailto: links on the homepage or /contact. |
social_links | array[string] | Social profile URLs found on the homepage. |
homepage_title | string | null | <title> text of the homepage. |
scraped_at | string | ISO-8601 UTC timestamp. |
Example output
{"domain": "allbirds.com","is_shopify": true,"detection_method": "products_json","myshopify_domain": "allbirds.myshopify.com","product_count": 87,"sample_product_titles": ["Men's Tree Runners","Women's Wool Runners","Tree Dasher 2"],"currency": "USD","email": null,"social_links": ["https://www.instagram.com/allbirds/","https://twitter.com/allbirds","https://www.facebook.com/allbirds"],"homepage_title": "Allbirds | Sustainable, Comfortable Shoes Made With Natural Materials","scraped_at": "2026-06-07T10:00:00Z"}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | USD | What it is |
|---|---|---|
actor-start | $0.005 | One-off warm-up charge per run |
result-row | $0.0025 | Per dataset row written |
Example: 1 000 confirmed Shopify leads at the rates above ≈ $2.50. No subscription, no minimum, no card to start — Apify gives every new account $5 free credit, good for 2 000 enriched leads before you owe anything.
🚧 Limitations
- Some stores block
/products.json: stores like Gymshark return 403. We fall back to header and HTML-marker detection, which confirms Shopify but can't retrieve product lists —product_countandsample_product_titleswill be null for those. - Email is best-effort: many Shopify stores hide contact email behind forms or Zendesk widgets. We surface
mailto:links only; JavaScript-rendered contact forms are out of scope for the HTTP-tier implementation. - Social links are homepage-scraped: some brands link social accounts only in the footer of inner pages. If they're not on the homepage, they won't appear.
- Rate limiting on large lists: running thousands of domains in a single Actor run is possible but slow. Batch in groups of 500–1 000 for predictable run times. Use
maxResultsto cap during testing. - Shopify Markets / headless: some enterprise Shopify deployments use custom domains + headless frontends that mask Shopify signals entirely. Detection rates on headless stores are lower.
❓ FAQ
Does this work on stores that block /products.json?
Yes. The Actor tries /products.json first, but it doesn't stop there. It also reads HTTP response headers (x-shopify-stage, x-sorting-hat-shopid) and scans the homepage HTML for Shopify CDN references and theme globals. A store that 403s /products.json will still be detected if any other signal is present — we just won't be able to populate product_count for it.
What domains should I feed it?
Any list of website domains works — e-commerce databases, trade-show exhibitor lists, Google Shopping scrape results, brand registries, or your own prospect pipeline. The Actor normalises URLs to bare domains automatically.
How accurate is the detection?
The three-signal approach catches the vast majority of Shopify stores. False positives are rare (we require at least one confirmed Shopify marker, not just heuristics). False negatives occur mainly on headless Shopify deployments that strip all Shopify headers and serve no Shopify CDN assets from the homepage.
Can I get more than 10 product titles per store?
Yes — raise maxProductsSample up to 250. Keep in mind that /products.json?limit=250 is a single larger request, which may tip rate-limit thresholds on stores that monitor request size.
Can I run this on a schedule?
Yes — wire it to an Apify Schedule and pass a static domain list. Useful for monitoring when a specific set of competitor domains migrates onto Shopify.
I need the full product catalogue, not just a sample — can you do that?
That's a different use case (deep product scraping) and outside the scope of this lead-gen Actor. If you need full catalogue extraction, contact us via the Issues tab.
💬 Your feedback
Spotted a false negative, hit a detection miss, or need an extra enrichment field? Open an issue on the Actor's Issues tab in Apify Console — we ship fixes weekly and every report gets read.