Pricing

Pay per usage

Email Extractor Pro — Bulk Website Emails, No Hunter.io Cap

Email lists in 2 min — emails + page title + URL + role hint as CSV. No Hunter.io cap. 201 runs · 14 users · 6 u30d · 98% success. From author of trustpilot-review-scraper (972r). B2B prospecting + sales. blog.spinov.online · dev.to/0012303

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

What this actor returns (verified against `src/main.js`)

When at least one email is found, one record is emitted per email:

{
  "email": "contact@example.com",
  "source": "https://example.com/contact",
  "domain": "example.com",
  "fromMailto": true,
  "phones": ["+1 (555) 123-4567", "+44 20 7946 0958"],
  "socialLinks": [
    { "platform": "linkedin", "url": "https://linkedin.com/company/example", "foundOn": "https://example.com/contact" }
  ],
  "scrapedAt": "2026-04-29T12:00:00.000Z"
}

When zero emails are found, a single summary record is emitted (still includes phones + socials if any):

{
  "email": null,
  "message": "No emails found on the provided URLs",
  "phones": ["..."],
  "socialLinks": [{ "platform": "...", "url": "...", "foundOn": "..." }],
  "urlsScanned": 5,
  "scrapedAt": "2026-04-29T12:00:00.000Z"
}

If neither emails, phones, nor social links are found, the record contains only email: null, message, urlsScanned, scrapedAt.

fromMailto: true flags emails extracted from <a href="mailto:..."> links — these are highest-confidence captures. Regex-extracted emails do not carry fromMailto: false; the field is simply absent. Code your downstream as email.get('fromMailto', False).

Features (verified against `src/main.js`)

Deep crawling — follows internal links up to maxDepth (configurable 0–5 in input_schema).
mailto: and tel: priority — <a href="mailto:..."> and <a href="tel:..."> extracted directly via DOM selector first, then text regex sweeps the rest.
7 social platforms — LinkedIn, Twitter / X, Facebook, Instagram, YouTube (@, /channel/, /c/), GitHub, TikTok (@handle form only).
Junk filter — drops noreply@, no-reply@, donotreply@, *@example.com, *@test.com, *@localhost, sentry.io, wixpress.com, wordpress.com, and image-suffix false positives (*.png, *.jpg, *.gif, *.svg, *@2x.*, *@3x.*). Also drops anything > 100 chars.
Email dedup — emails are lowercased + Map-based dedup runs across the whole crawl regardless of the deduplicateEmails flag (see Honest Limitations).
Phone regex — requires either a +CC international prefix, a (NNN) group, or NNN-NNN-NNNN separators — avoids matching plain numeric IDs / SKU strings.
Same-domain enforcement — strategy: 'same-domain' plus a transformRequestFunction that re-checks linkDomain === targetDomain. Won't wander into ad networks or analytics CDNs.

Use cases

Lead generation — build email lists from category-targeted seed company sites
Sales prospecting — get decision-maker contacts across hundreds of target companies in one run
PR / link building — collect journalist + blogger emails for outreach
Recruitment — extract recruiter contacts from agency / freelancer sites
Data enrichment — append email + phone + social to an existing CRM dump

Input parameters (verified against `.actor/input_schema.json`)

Parameter	Type	Default	Range	Description
`urls`	Array	`[]` (required)	—	Website URLs to scan, with or without `https://` prefix
`maxPagesPerDomain`	Integer	`20`	1–100	Pages per domain (see budget caveat below)
`maxDepth`	Integer	`2`	0–5	Link depth (0 = provided URLs only)
`includePhones`	Boolean	`true`	—	Extract phone numbers (text + `tel:` href)
`includeSocialLinks`	Boolean	`true`	—	Extract social profile URLs
`deduplicateEmails`	Boolean	`true`	—	Currently a no-op — see Honest Limitations

Honest limitations

deduplicateEmails is a dead input. It is destructured from input (line 11 of src/main.js) but never referenced again. Email dedup via the in-memory Map keyed by lowercase email always runs regardless of this flag's value. Setting it to false does not produce duplicate rows. The schema field is kept for backwards compatibility; treat it as informational only.
The "priority routing" for /contact / /about / /team / /imprint / /impressum / /privacy / /legal paths is cosmetic. The transformRequestFunction sets request.userData.priority = 1 on those URLs, but Crawlee's RequestQueue does not consult userData.priority for queue ordering — pages are processed FIFO. To genuinely front-load contact pages, request a custom build using forefront: true on addRequests or a dedicated priority queue.
maxRequestsPerCrawl = urls.length × maxPagesPerDomain. This is a shared budget pool, not per-domain. If domain #1 has many internal links, it can consume far more than maxPagesPerDomain requests and starve domains #2-#N. To get a hard per-domain cap, run domains in separate Apify runs.
maxConcurrency: 10 is global, not per-domain. Five domains share the same 10-request concurrency budget. Larger fan-outs need a custom build.
Per-email socialLinks are filtered by foundOn === source URL. A social link discovered on /about while the email lives on /contact will not appear in that email's record (it does still appear in the global crawl state but is not joined to that email).
Phone numbers are pooled globally per run, not per email. Every email record gets the same phones: [...allPhones] array. There is no email→phone proximity join.
No proxy by default. CheerioCrawler is constructed without proxyConfiguration. Targets that block datacenter IPs (Cloudflare-protected, anti-bot-walled sites) will return 403/empty. For proxy-routed runs, request a custom build.
Static HTML only. No headless browser → JavaScript-rendered email/contact pages return no data. About 30 % of modern marketing sites move contact info into client-rendered React; this actor cannot extract those.
Phone regex over-fires on long invoice numbers / part numbers that match NNN-NNN-NNNN shapes. The output is debouncedly filtered by digit count (7–15) but borderline values still slip through.
Junk filter is conservative. Real emails on wordpress.com or wixpress.com mailbox subdomains are filtered out (false negatives) — a tradeoff for cleaner output.

Cost

Apify charges per compute unit and per result, not per email. Concrete cost depends on page weight, redirects, and maxDepth. As of 2026-04-29, a 50-domain run with maxPagesPerDomain: 20, maxDepth: 2 typically completes inside Apify's $5 free-tier credit; heavier runs scale linearly with crawled pages. Run a small test (1–2 domains) and check Console → Run Cost before scaling.

Quick start

Open the actor → Try for free.
Paste target URLs:

{
  "urls": ["https://competitor1.com", "https://competitor2.com"],
  "maxPagesPerDomain": 20,
  "maxDepth": 2
}

Click Start. Results stream into the dataset (JSON / CSV / Excel export).

Pulling results from your code

Python (apify-client):

from apify_client import ApifyClient

client = ApifyClient("YOUR_TOKEN")
run = client.actor("knotless_cadence/email-extractor-pro").call(run_input={
    "urls": ["https://example.com"],
    "maxDepth": 2,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["email"], "—", item.get("domain"), "—", item.get("fromMailto", False))

JavaScript (fetch):

const res = await fetch(
  `https://api.apify.com/v2/acts/knotless_cadence~email-extractor-pro/runs/last/dataset/items?token=YOUR_TOKEN`
);
const contacts = await res.json();

How it works

Seed — homepage HTML is fetched.
Link discovery — internal links extracted via Crawlee enqueueLinks({ strategy: 'same-domain' }). The transformRequestFunction filters cross-domain links and tags priority (cosmetic).
Email pass — regex over rendered HTML + body.text() + mailto: href extraction.
Phone pass — tel: href + strict text regex (separator-or-prefix required).
Social pass — 7-platform regex over raw HTML (catches links inside footer markup, JSON-LD, attribute strings).
Filter + dedup — junk patterns rejected, lowercase normalization, Map-based dedup.

Combine with other actors in this portfolio

Google Maps Scraper Pro — find businesses by category + location.
Email Extractor Pro (this one) — pull their contact emails.
Email Validator — verify deliverability before outreach.
Website Tech Stack Detector — qualify leads by their stack.

Proof of delivery: 31 published Apify actors (78 total in portfolio). The flagship Trustpilot scraper has 951 lifetime production runs; this Email Extractor has 107+ runs. One paid 3-article series shipped in March 2026 ($150, proxy industry). Pilot pricing locked through May 2026.

Sample request? Reply sample to spinov001@gmail.com and we'll send 2 published case-study articles within 24 hours.

Need a custom build instead of self-serve?

Tier	Price	Includes
Pilot	$97	1 actor or modification, 7-day support
Standard	$297	Custom actor + Slack/email alerts on results, 30-day support
Premium	$797	Custom actor + dashboard + 90-day support + 1 modification round

Drop specs, schema, or target URLs in an email — quote back same day.

Email: spinov001@gmail.com

Proof of work: 31 published Apify actors / 78 total in portfolio — 951 lifetime runs on the Trustpilot scraper, paid 3-article series delivered for a client in the proxy industry ($150).

Blog (case studies + writeups): blog.spinov.online

Telegram channel (scraping & data engineering tips): t.me/scraping_ai

Honest disclosure

107 lifetime production runs on this specific actor as of 2026-04-30 — well past prototype, but smaller-volume than the Trustpilot flagship.
No private data scraped. No login bypass. robots.txt is not explicitly checked (Crawlee default). For strict compliance with robots.txt, request a custom build that wires it in.
Independent project — not affiliated with Hunter.io, Apollo, or any other lead-gen vendor.
This actor is maintained by the same author who runs apify.com/knotless_cadence (78 actors, 31 public).

Trustpilot Review Scraper — Unlimited Reviews, Bypass 200 Limit

knotless_cadence/trustpilot-review-scraper

Trustpilot reviews → CSV/JSON/Excel in 2min. 972 runs · 797/30d · 100% success · bypasses 200-review cap. 9 fields (stars, text, author, date, lang, company, URL, headline, verified). BI, competitor research, lead enrichment. blog.spinov.online · dev.to/0012303 · spinov001@gmail.com

Alex

hunter.io

canadesk/hunter-io

Search for emails, and enrich profiles with Hunter!

Canadesk Support

315

Email Finder Emails Scraper - Hunter.io Alternative

fatihai-tools/contact-email-finder

Find emails from Email Finder fast. Bulk URL or query input, structured JSON/CSV output, no login required. Free trial — perfect alternative to Hunter.io, Snov.io. Use for lead generation, market research, competitive analysis.

fatih dağüstü

Reddit Scraper Pro — Posts, Comments, Subreddits, No API Key

knotless_cadence/reddit-discussion-scraper

Reddit scraper via public JSON — posts + comments, no login. 20 fields/post (score, ratio, flair, NSFW). CSV/JSON. 101 runs · 6 users · u30d=2 · 27/30d. Trend research + LLM training data. blog.spinov.online · dev.to/0012303 · spinov001@gmail.com

Alex

Google News Scraper — Fast Headlines & Sources [No API Key]

knotless_cadence/google-news-scraper

Monitor Google News fast. No API, no RSS limits, no blocks. Titles, dates, snippets, sources → CSV. 75 lifetime runs · 100% 30d success · u30d=3, u7d=1 · 8 paying users. dev.to/0012303 (Proxy-Seller 2320w paid) · blog.spinov.online · spinov001@gmail.com

Alex

Website Contact Finder Info Scraper - Hunter.io Alternative

fatihai-tools/website-contact-finder

Find info from Website Contact Finder fast. Bulk URL or query input, structured JSON/CSV output, no login required. Free trial — perfect alternative to Hunter.io, Snov.io. Use for lead generation, market research, competitive analysis.

fatih dağüstü

Bluesky Scraper — Posts, Followers & Profiles [No API Limits]

knotless_cadence/bluesky-scraper

Bluesky posts, profiles & feeds in CSV in 2 min — no API waitlist, no rate limits, no bans. 44 runs · fresh u7d signal · 100% 30d success. Text/images/likes/reposts/profile metadata. Post-Twitter audience tracking + creator discovery + brand listening. dev.to/0012303 · blog.spinov.online

Alex

📧 Website Email Extractor — Bulk Contact Scraper

nexgendata/website-email-extractor

Extract emails, phone numbers & social profiles from any website. Crawls contact/about pages automatically. Hunter.io alternative for lead generation.

Stephan Corbeil

Glassdoor Scraper — Reviews, Salaries, CSV, No Login Required

knotless_cadence/glassdoor-reviews-scraper

Glassdoor reviews + salary in CSV/JSON in 5 min — no coding, no login, no rate-limits. 59 lifetime runs · 5 paying users · u30d=1 active. Ratings/pros-cons/titles/dates/salary schema. Competitive intel + recruiter outreach + comp planning. dev.to/0012303 · blog.spinov.online

Alex

Social Profiles — Bio, Followers, Posts in CSV, Bulk

knotless_cadence/social-profile-scraper

Social profile data CSV/JSON — username, bio, followers, following, posts. Same schema LinkedIn/GitHub/Reddit. 52 lifetime runs · 9 users · 5 active 30d · 100% success rate. B2B prospecting/ABM/recruiter sourcing. dev.to/0012303 · blog.spinov.online

Alex