Shopify Store Leads Scraper avatar

Shopify Store Leads Scraper

Pricing

Pay per usage

Go to Apify Store
Shopify Store Leads Scraper

Shopify Store Leads Scraper

Turn a domain list into qualified Shopify B2B leads. Detects Shopify stores via /products.json, response headers, and HTML markers — enriches with product count, contact email, social links, and currency. Works on stores that block the products endpoint.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

9 days ago

Last modified

Share


🎯 What this scrapes

Shopify powers over 4 million stores worldwide. This Actor takes a list of website domains and runs multi-signal detection across each one: it tries /products.json, reads HTTP headers (x-shopify-stage, x-sorting-hat-shopid, powered-by), and scans the homepage HTML for cdn.shopify.com, Shopify.theme, and *.myshopify.com references. Whichever signal fires first wins. Confirmed Shopify stores are then enriched with product count, a sample of product titles, the *.myshopify.com subdomain, any contact email found on the homepage or /contact page, social profile links, and the storefront currency.

Run it once to qualify a prospect list, or schedule it weekly to monitor an entire vertical.

🔥 What we handle for you

  • 🛡️ Multi-signal detection — three independent detection layers so stores that block /products.json (like Gymshark) are still correctly identified via headers and HTML markers.
  • 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every request so your detection run doesn't get flagged as a crawler.
  • 🔁 Retries with exponential backoff on 408 / 429 / 5xx — up to 5 attempts per domain, Retry-After honoured.
  • 🧱 Rate-limit-aware pacing — when a CDN or WAF pushes back, we slow down and rotate before retrying, not after failing.
  • 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, JSON / CSV / Excel export from the Apify Console.
  • 💰 Pay-Per-Event pricing — you pay only for confirmed results that hit your dataset. Zero rows, zero charge (beyond the small start event).

💡 Use cases

  • Shopify app vendor prospecting — feed a domain list from a niche directory, get back every Shopify store with email and product count, ready to import into your CRM.
  • DTC agency outreach — qualify inbound leads: "does this prospect actually run Shopify?" answered programmatically, no manual checks.
  • Competitive intelligence — track which new entrants in a product category have spun up Shopify stores over the last 30 days.
  • Platform migration research — identify the Shopify footprint in a specific industry vertical before pitching a replatforming project.
  • Market-sizing for app developers — count confirmed Shopify stores in a niche to validate an app idea before building.

⚙️ How to use it

  1. Click Try for free at the top of the page.
  2. Paste your domain list into the Domains to check field — one domain per line, bare domains or full URLs both work.
  3. Adjust Max products to sample, Only Shopify stores, and the proxy settings as needed.
  4. Click Start. Rows stream into the dataset as each domain is resolved.
  5. Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the Apify API.

For no-code workflows: install the Apify node for n8n or the Apify module for Make, connect your token, and point it at DevilScrapes/shopify-store-leads-scraper.

📥 Input

FieldTypeRequiredDefaultNotes
domainsarrayyes["allbirds.com", "gymshark.com"]Domains or URLs to probe. Scheme and path are stripped automatically.
includeProductsbooleannotrueFetch product count + sample titles from /products.json or sitemap.
maxProductsSampleintegerno10Max product titles per domain (1–250).
onlyShopifybooleannotrueDrop non-Shopify domains from the dataset. Set false for a full audit.
maxResultsintegerno50Hard cap on rows emitted. 0 = unlimited.
proxyConfigurationobjectnoApify residential proxyProxy routing. Residential recommended to avoid datacenter blocks.

Example input

{
"domains": ["allbirds.com", "gymshark.com", "kylie.com"],
"includeProducts": true,
"maxProductsSample": 10,
"onlyShopify": true,
"maxResults": 50,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

📤 Output

One dataset row per probed domain (or per confirmed Shopify store when onlyShopify=true).

FieldTypeNotes
domainstringNormalised domain that was probed.
is_shopifybooleantrue when any detection signal confirmed Shopify.
detection_methodstring | nullWhich signal fired: products_json, headers, html_markers, or none.
myshopify_domainstring | null*.myshopify.com subdomain when discoverable.
product_countinteger | nullTotal products reported by /products.json. Null when endpoint blocked.
sample_product_titlesarray[string]Up to maxProductsSample product titles.
currencystring | nullISO-4217 currency detected on the storefront (e.g. USD, GBP).
emailstring | nullContact email found via mailto: links on the homepage or /contact.
social_linksarray[string]Social profile URLs found on the homepage.
homepage_titlestring | null<title> text of the homepage.
scraped_atstringISO-8601 UTC timestamp.

Example output

{
"domain": "allbirds.com",
"is_shopify": true,
"detection_method": "products_json",
"myshopify_domain": "allbirds.myshopify.com",
"product_count": 87,
"sample_product_titles": [
"Men's Tree Runners",
"Women's Wool Runners",
"Tree Dasher 2"
],
"currency": "USD",
"email": null,
"social_links": [
"https://www.instagram.com/allbirds/",
"https://twitter.com/allbirds",
"https://www.facebook.com/allbirds"
],
"homepage_title": "Allbirds | Sustainable, Comfortable Shoes Made With Natural Materials",
"scraped_at": "2026-06-07T10:00:00Z"
}

💰 Pricing

Pay-Per-Event — you pay only when these events fire:

EventUSDWhat it is
actor-start$0.005One-off warm-up charge per run
result-row$0.0025Per dataset row written

Example: 1 000 confirmed Shopify leads at the rates above ≈ $2.50. No subscription, no minimum, no card to start — Apify gives every new account $5 free credit, good for 2 000 enriched leads before you owe anything.

🚧 Limitations

  • Some stores block /products.json: stores like Gymshark return 403. We fall back to header and HTML-marker detection, which confirms Shopify but can't retrieve product lists — product_count and sample_product_titles will be null for those.
  • Email is best-effort: many Shopify stores hide contact email behind forms or Zendesk widgets. We surface mailto: links only; JavaScript-rendered contact forms are out of scope for the HTTP-tier implementation.
  • Social links are homepage-scraped: some brands link social accounts only in the footer of inner pages. If they're not on the homepage, they won't appear.
  • Rate limiting on large lists: running thousands of domains in a single Actor run is possible but slow. Batch in groups of 500–1 000 for predictable run times. Use maxResults to cap during testing.
  • Shopify Markets / headless: some enterprise Shopify deployments use custom domains + headless frontends that mask Shopify signals entirely. Detection rates on headless stores are lower.

❓ FAQ

Does this work on stores that block /products.json?

Yes. The Actor tries /products.json first, but it doesn't stop there. It also reads HTTP response headers (x-shopify-stage, x-sorting-hat-shopid) and scans the homepage HTML for Shopify CDN references and theme globals. A store that 403s /products.json will still be detected if any other signal is present — we just won't be able to populate product_count for it.

What domains should I feed it?

Any list of website domains works — e-commerce databases, trade-show exhibitor lists, Google Shopping scrape results, brand registries, or your own prospect pipeline. The Actor normalises URLs to bare domains automatically.

How accurate is the detection?

The three-signal approach catches the vast majority of Shopify stores. False positives are rare (we require at least one confirmed Shopify marker, not just heuristics). False negatives occur mainly on headless Shopify deployments that strip all Shopify headers and serve no Shopify CDN assets from the homepage.

Can I get more than 10 product titles per store?

Yes — raise maxProductsSample up to 250. Keep in mind that /products.json?limit=250 is a single larger request, which may tip rate-limit thresholds on stores that monitor request size.

Can I run this on a schedule?

Yes — wire it to an Apify Schedule and pass a static domain list. Useful for monitoring when a specific set of competitor domains migrates onto Shopify.

I need the full product catalogue, not just a sample — can you do that?

That's a different use case (deep product scraping) and outside the scope of this lead-gen Actor. If you need full catalogue extraction, contact us via the Issues tab.

💬 Your feedback

Spotted a false negative, hit a detection miss, or need an extra enrichment field? Open an issue on the Actor's Issues tab in Apify Console — we ship fixes weekly and every report gets read.