Email Extractor Pro — Bulk Website Emails, No Hunter.io Cap
Pricing
Pay per usage
Email Extractor Pro — Bulk Website Emails, No Hunter.io Cap
Email lists in 2 min — emails + page title + URL + role hint as CSV. No Hunter.io cap. 201 runs · 14 users · 6 u30d · 98% success. From author of trustpilot-review-scraper (972r). B2B prospecting + sales. blog.spinov.online · dev.to/0012303
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Alex
Maintained by CommunityActor stats
1
Bookmarked
15
Total users
7
Monthly active users
3 days ago
Last modified
Categories
Share
Extract emails, phone numbers, and social media links from any list of websites. The crawler walks each domain on the same-host strategy, deduplicates, filters junk, and returns a flat dataset ready for outreach. No 25-results-per-month cap, no per-lookup pricing — pay per Apify compute unit and run as many domains as you need.
107 lifetime production runs on this exact actor as of 2026-04-30.
What this actor returns (verified against src/main.js)
When at least one email is found, one record is emitted per email:
{"email": "contact@example.com","source": "https://example.com/contact","domain": "example.com","fromMailto": true,"phones": ["+1 (555) 123-4567", "+44 20 7946 0958"],"socialLinks": [{ "platform": "linkedin", "url": "https://linkedin.com/company/example", "foundOn": "https://example.com/contact" }],"scrapedAt": "2026-04-29T12:00:00.000Z"}
When zero emails are found, a single summary record is emitted (still includes phones + socials if any):
{"email": null,"message": "No emails found on the provided URLs","phones": ["..."],"socialLinks": [{ "platform": "...", "url": "...", "foundOn": "..." }],"urlsScanned": 5,"scrapedAt": "2026-04-29T12:00:00.000Z"}
If neither emails, phones, nor social links are found, the record contains only email: null, message, urlsScanned, scrapedAt.
fromMailto: true flags emails extracted from <a href="mailto:..."> links — these are highest-confidence captures. Regex-extracted emails do not carry fromMailto: false; the field is simply absent. Code your downstream as email.get('fromMailto', False).
Features (verified against src/main.js)
- Deep crawling — follows internal links up to
maxDepth(configurable 0–5 ininput_schema). mailto:andtel:priority —<a href="mailto:...">and<a href="tel:...">extracted directly via DOM selector first, then text regex sweeps the rest.- 7 social platforms — LinkedIn, Twitter / X, Facebook, Instagram, YouTube (
@,/channel/,/c/), GitHub, TikTok (@handleform only). - Junk filter — drops
noreply@,no-reply@,donotreply@,*@example.com,*@test.com,*@localhost,sentry.io,wixpress.com,wordpress.com, and image-suffix false positives (*.png,*.jpg,*.gif,*.svg,*@2x.*,*@3x.*). Also drops anything > 100 chars. - Email dedup — emails are lowercased + Map-based dedup runs across the whole crawl regardless of the
deduplicateEmailsflag (see Honest Limitations). - Phone regex — requires either a
+CCinternational prefix, a(NNN)group, orNNN-NNN-NNNNseparators — avoids matching plain numeric IDs / SKU strings. - Same-domain enforcement —
strategy: 'same-domain'plus atransformRequestFunctionthat re-checkslinkDomain === targetDomain. Won't wander into ad networks or analytics CDNs.
Use cases
- Lead generation — build email lists from category-targeted seed company sites
- Sales prospecting — get decision-maker contacts across hundreds of target companies in one run
- PR / link building — collect journalist + blogger emails for outreach
- Recruitment — extract recruiter contacts from agency / freelancer sites
- Data enrichment — append email + phone + social to an existing CRM dump
Input parameters (verified against .actor/input_schema.json)
| Parameter | Type | Default | Range | Description |
|---|---|---|---|---|
urls | Array | [] (required) | — | Website URLs to scan, with or without https:// prefix |
maxPagesPerDomain | Integer | 20 | 1–100 | Pages per domain (see budget caveat below) |
maxDepth | Integer | 2 | 0–5 | Link depth (0 = provided URLs only) |
includePhones | Boolean | true | — | Extract phone numbers (text + tel: href) |
includeSocialLinks | Boolean | true | — | Extract social profile URLs |
deduplicateEmails | Boolean | true | — | Currently a no-op — see Honest Limitations |
Honest limitations
deduplicateEmailsis a dead input. It is destructured frominput(line 11 ofsrc/main.js) but never referenced again. Email dedup via the in-memoryMapkeyed by lowercase email always runs regardless of this flag's value. Setting it tofalsedoes not produce duplicate rows. The schema field is kept for backwards compatibility; treat it as informational only.- The "priority routing" for
/contact//about//team//imprint//impressum//privacy//legalpaths is cosmetic. ThetransformRequestFunctionsetsrequest.userData.priority = 1on those URLs, but Crawlee'sRequestQueuedoes not consultuserData.priorityfor queue ordering — pages are processed FIFO. To genuinely front-load contact pages, request a custom build usingforefront: trueonaddRequestsor a dedicated priority queue. maxRequestsPerCrawl = urls.length × maxPagesPerDomain. This is a shared budget pool, not per-domain. If domain #1 has many internal links, it can consume far more thanmaxPagesPerDomainrequests and starve domains #2-#N. To get a hard per-domain cap, run domains in separate Apify runs.maxConcurrency: 10is global, not per-domain. Five domains share the same 10-request concurrency budget. Larger fan-outs need a custom build.- Per-email
socialLinksare filtered byfoundOn === sourceURL. A social link discovered on/aboutwhile the email lives on/contactwill not appear in that email's record (it does still appear in the global crawl state but is not joined to that email). - Phone numbers are pooled globally per run, not per email. Every email record gets the same
phones: [...allPhones]array. There is no email→phone proximity join. - No proxy by default.
CheerioCrawleris constructed withoutproxyConfiguration. Targets that block datacenter IPs (Cloudflare-protected, anti-bot-walled sites) will return 403/empty. For proxy-routed runs, request a custom build. - Static HTML only. No headless browser → JavaScript-rendered email/contact pages return no data. About 30 % of modern marketing sites move contact info into client-rendered React; this actor cannot extract those.
- Phone regex over-fires on long invoice numbers / part numbers that match
NNN-NNN-NNNNshapes. The output is debouncedly filtered by digit count (7–15) but borderline values still slip through. - Junk filter is conservative. Real emails on
wordpress.comorwixpress.commailbox subdomains are filtered out (false negatives) — a tradeoff for cleaner output.
Cost
Apify charges per compute unit and per result, not per email. Concrete cost depends on page weight, redirects, and maxDepth. As of 2026-04-29, a 50-domain run with maxPagesPerDomain: 20, maxDepth: 2 typically completes inside Apify's $5 free-tier credit; heavier runs scale linearly with crawled pages. Run a small test (1–2 domains) and check Console → Run Cost before scaling.
Quick start
- Open the actor → Try for free.
- Paste target URLs:
{"urls": ["https://competitor1.com", "https://competitor2.com"],"maxPagesPerDomain": 20,"maxDepth": 2}
- Click Start. Results stream into the dataset (JSON / CSV / Excel export).
Pulling results from your code
Python (apify-client):
from apify_client import ApifyClientclient = ApifyClient("YOUR_TOKEN")run = client.actor("knotless_cadence/email-extractor-pro").call(run_input={"urls": ["https://example.com"],"maxDepth": 2,})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item["email"], "—", item.get("domain"), "—", item.get("fromMailto", False))
JavaScript (fetch):
const res = await fetch(`https://api.apify.com/v2/acts/knotless_cadence~email-extractor-pro/runs/last/dataset/items?token=YOUR_TOKEN`);const contacts = await res.json();
How it works
- Seed — homepage HTML is fetched.
- Link discovery — internal links extracted via Crawlee
enqueueLinks({ strategy: 'same-domain' }). ThetransformRequestFunctionfilters cross-domain links and tagspriority(cosmetic). - Email pass — regex over rendered HTML +
body.text()+mailto:hrefextraction. - Phone pass —
tel:href + strict text regex (separator-or-prefix required). - Social pass — 7-platform regex over raw HTML (catches links inside footer markup, JSON-LD, attribute strings).
- Filter + dedup — junk patterns rejected, lowercase normalization,
Map-based dedup.
Combine with other actors in this portfolio
- Google Maps Scraper Pro — find businesses by category + location.
- Email Extractor Pro (this one) — pull their contact emails.
- Email Validator — verify deliverability before outreach.
- Website Tech Stack Detector — qualify leads by their stack.
Proof of delivery: 31 published Apify actors (78 total in portfolio). The flagship Trustpilot scraper has 951 lifetime production runs; this Email Extractor has 107+ runs. One paid 3-article series shipped in March 2026 ($150, proxy industry). Pilot pricing locked through May 2026.
Sample request? Reply sample to spinov001@gmail.com and we'll send 2 published case-study articles within 24 hours.
Need a custom build instead of self-serve?
| Tier | Price | Includes |
|---|---|---|
| Pilot | $97 | 1 actor or modification, 7-day support |
| Standard | $297 | Custom actor + Slack/email alerts on results, 30-day support |
| Premium | $797 | Custom actor + dashboard + 90-day support + 1 modification round |
Drop specs, schema, or target URLs in an email — quote back same day.
Email: spinov001@gmail.com
Proof of work: 31 published Apify actors / 78 total in portfolio — 951 lifetime runs on the Trustpilot scraper, paid 3-article series delivered for a client in the proxy industry ($150).
Blog (case studies + writeups): blog.spinov.online
Telegram channel (scraping & data engineering tips): t.me/scraping_ai
Honest disclosure
- 107 lifetime production runs on this specific actor as of 2026-04-30 — well past prototype, but smaller-volume than the Trustpilot flagship.
- No private data scraped. No login bypass.
robots.txtis not explicitly checked (Crawlee default). For strict compliance withrobots.txt, request a custom build that wires it in. - Independent project — not affiliated with Hunter.io, Apollo, or any other lead-gen vendor.
- This actor is maintained by the same author who runs
apify.com/knotless_cadence(78 actors, 31 public).