Email Extractor Pro — Bulk Website Emails, No Hunter.io Cap
Pricing
Pay per usage
Email Extractor Pro — Bulk Website Emails, No Hunter.io Cap
Outreach-ready email lists in 2 min — emails + page title + URL + role hint as CSV. No Hunter.io cap, no Apollo seat fee. 109 lifetime runs · 10 paying users. For B2B prospecting + sales outreach + recruiter sourcing. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Alex
Actor stats
0
Bookmarked
10
Total users
5
Monthly active users
28 minutes ago
Last modified
Categories
Share
Extract emails, phone numbers, and social media links from any list of websites. The crawler walks each domain on the same-host strategy, deduplicates, filters junk, and returns a flat dataset ready for outreach. No 25-results-per-month cap, no per-lookup pricing — pay per Apify compute unit and run as many domains as you need.
107 lifetime production runs on this exact actor as of 2026-04-30.
What this actor returns (verified against src/main.js)
When at least one email is found, one record is emitted per email:
{"email": "contact@example.com","source": "https://example.com/contact","domain": "example.com","fromMailto": true,"phones": ["+1 (555) 123-4567", "+44 20 7946 0958"],"socialLinks": [{ "platform": "linkedin", "url": "https://linkedin.com/company/example", "foundOn": "https://example.com/contact" }],"scrapedAt": "2026-04-29T12:00:00.000Z"}
When zero emails are found, a single summary record is emitted (still includes phones + socials if any):
{"email": null,"message": "No emails found on the provided URLs","phones": ["..."],"socialLinks": [{ "platform": "...", "url": "...", "foundOn": "..." }],"urlsScanned": 5,"scrapedAt": "2026-04-29T12:00:00.000Z"}
If neither emails, phones, nor social links are found, the record contains only email: null, message, urlsScanned, scrapedAt.
fromMailto: true flags emails extracted from <a href="mailto:..."> links — these are highest-confidence captures. Regex-extracted emails do not carry fromMailto: false; the field is simply absent. Code your downstream as email.get('fromMailto', False).
Features (verified against src/main.js)
- Deep crawling — follows internal links up to
maxDepth(configurable 0–5 ininput_schema). mailto:andtel:priority —<a href="mailto:...">and<a href="tel:...">extracted directly via DOM selector first, then text regex sweeps the rest.- 7 social platforms — LinkedIn, Twitter / X, Facebook, Instagram, YouTube (
@,/channel/,/c/), GitHub, TikTok (@handleform only). - Junk filter — drops
noreply@,no-reply@,donotreply@,*@example.com,*@test.com,*@localhost,sentry.io,wixpress.com,wordpress.com, and image-suffix false positives (*.png,*.jpg,*.gif,*.svg,*@2x.*,*@3x.*). Also drops anything > 100 chars. - Email dedup — emails are lowercased + Map-based dedup runs across the whole crawl regardless of the
deduplicateEmailsflag (see Honest Limitations). - Phone regex — requires either a
+CCinternational prefix, a(NNN)group, orNNN-NNN-NNNNseparators — avoids matching plain numeric IDs / SKU strings. - Same-domain enforcement —
strategy: 'same-domain'plus atransformRequestFunctionthat re-checkslinkDomain === targetDomain. Won't wander into ad networks or analytics CDNs.
Use cases
- Lead generation — build email lists from category-targeted seed company sites
- Sales prospecting — get decision-maker contacts across hundreds of target companies in one run
- PR / link building — collect journalist + blogger emails for outreach
- Recruitment — extract recruiter contacts from agency / freelancer sites
- Data enrichment — append email + phone + social to an existing CRM dump
Input parameters (verified against .actor/input_schema.json)
| Parameter | Type | Default | Range | Description |
|---|---|---|---|---|
urls | Array | [] (required) | — | Website URLs to scan, with or without https:// prefix |
maxPagesPerDomain | Integer | 20 | 1–100 | Pages per domain (see budget caveat below) |
maxDepth | Integer | 2 | 0–5 | Link depth (0 = provided URLs only) |
includePhones | Boolean | true | — | Extract phone numbers (text + tel: href) |
includeSocialLinks | Boolean | true | — | Extract social profile URLs |
deduplicateEmails | Boolean | true | — | Currently a no-op — see Honest Limitations |
Honest limitations
deduplicateEmailsis a dead input. It is destructured frominput(line 11 ofsrc/main.js) but never referenced again. Email dedup via the in-memoryMapkeyed by lowercase email always runs regardless of this flag's value. Setting it tofalsedoes not produce duplicate rows. The schema field is kept for backwards compatibility; treat it as informational only.- The "priority routing" for
/contact//about//team//imprint//impressum//privacy//legalpaths is cosmetic. ThetransformRequestFunctionsetsrequest.userData.priority = 1on those URLs, but Crawlee'sRequestQueuedoes not consultuserData.priorityfor queue ordering — pages are processed FIFO. To genuinely front-load contact pages, request a custom build usingforefront: trueonaddRequestsor a dedicated priority queue. maxRequestsPerCrawl = urls.length × maxPagesPerDomain. This is a shared budget pool, not per-domain. If domain #1 has many internal links, it can consume far more thanmaxPagesPerDomainrequests and starve domains #2-#N. To get a hard per-domain cap, run domains in separate Apify runs.maxConcurrency: 10is global, not per-domain. Five domains share the same 10-request concurrency budget. Larger fan-outs need a custom build.- Per-email
socialLinksare filtered byfoundOn === sourceURL. A social link discovered on/aboutwhile the email lives on/contactwill not appear in that email's record (it does still appear in the global crawl state but is not joined to that email). - Phone numbers are pooled globally per run, not per email. Every email record gets the same
phones: [...allPhones]array. There is no email→phone proximity join. - No proxy by default.
CheerioCrawleris constructed withoutproxyConfiguration. Targets that block datacenter IPs (Cloudflare-protected, anti-bot-walled sites) will return 403/empty. For proxy-routed runs, request a custom build. - Static HTML only. No headless browser → JavaScript-rendered email/contact pages return no data. About 30 % of modern marketing sites move contact info into client-rendered React; this actor cannot extract those.
- Phone regex over-fires on long invoice numbers / part numbers that match
NNN-NNN-NNNNshapes. The output is debouncedly filtered by digit count (7–15) but borderline values still slip through. - Junk filter is conservative. Real emails on
wordpress.comorwixpress.commailbox subdomains are filtered out (false negatives) — a tradeoff for cleaner output.
Cost
Apify charges per compute unit and per result, not per email. Concrete cost depends on page weight, redirects, and maxDepth. As of 2026-04-29, a 50-domain run with maxPagesPerDomain: 20, maxDepth: 2 typically completes inside Apify's $5 free-tier credit; heavier runs scale linearly with crawled pages. Run a small test (1–2 domains) and check Console → Run Cost before scaling.
Quick start
- Open the actor → Try for free.
- Paste target URLs:
{"urls": ["https://competitor1.com", "https://competitor2.com"],"maxPagesPerDomain": 20,"maxDepth": 2}
- Click Start. Results stream into the dataset (JSON / CSV / Excel export).
Pulling results from your code
Python (apify-client):
from apify_client import ApifyClientclient = ApifyClient("YOUR_TOKEN")run = client.actor("knotless_cadence/email-extractor-pro").call(run_input={"urls": ["https://example.com"],"maxDepth": 2,})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item["email"], "—", item.get("domain"), "—", item.get("fromMailto", False))
JavaScript (fetch):
const res = await fetch(`https://api.apify.com/v2/acts/knotless_cadence~email-extractor-pro/runs/last/dataset/items?token=YOUR_TOKEN`);const contacts = await res.json();
How it works
- Seed — homepage HTML is fetched.
- Link discovery — internal links extracted via Crawlee
enqueueLinks({ strategy: 'same-domain' }). ThetransformRequestFunctionfilters cross-domain links and tagspriority(cosmetic). - Email pass — regex over rendered HTML +
body.text()+mailto:hrefextraction. - Phone pass —
tel:href + strict text regex (separator-or-prefix required). - Social pass — 7-platform regex over raw HTML (catches links inside footer markup, JSON-LD, attribute strings).
- Filter + dedup — junk patterns rejected, lowercase normalization,
Map-based dedup.
Combine with other actors in this portfolio
- Google Maps Scraper Pro — find businesses by category + location.
- Email Extractor Pro (this one) — pull their contact emails.
- Email Validator — verify deliverability before outreach.
- Website Tech Stack Detector — qualify leads by their stack.
Proof of delivery: 31 published Apify actors (78 total in portfolio). The flagship Trustpilot scraper has 951 lifetime production runs; this Email Extractor has 107+ runs. One paid 3-article series shipped in March 2026 ($150, proxy industry). Pilot pricing locked through May 2026.
Sample request? Reply sample to spinov001@gmail.com and we'll send 2 published case-study articles within 24 hours.
Need a custom build instead of self-serve?
| Tier | Price | Includes |
|---|---|---|
| Pilot | $97 | 1 actor or modification, 7-day support |
| Standard | $297 | Custom actor + Slack/email alerts on results, 30-day support |
| Premium | $797 | Custom actor + dashboard + 90-day support + 1 modification round |
Drop specs, schema, or target URLs in an email — quote back same day.
Email: spinov001@gmail.com
Proof of work: 31 published Apify actors / 78 total in portfolio — 951 lifetime runs on the Trustpilot scraper, paid 3-article series delivered for a client in the proxy industry ($150).
Blog (case studies + writeups): blog.spinov.online
Telegram channel (scraping & data engineering tips): t.me/scraping_ai
Honest disclosure
- 107 lifetime production runs on this specific actor as of 2026-04-30 — well past prototype, but smaller-volume than the Trustpilot flagship.
- No private data scraped. No login bypass.
robots.txtis not explicitly checked (Crawlee default). For strict compliance withrobots.txt, request a custom build that wires it in. - Independent project — not affiliated with Hunter.io, Apollo, or any other lead-gen vendor.
- This actor is maintained by the same author who runs
apify.com/knotless_cadence(78 actors, 31 public).