Email & Contact Extractor — Emails, Phones, Socials
Pricing
from $1.20 / 1,000 results
Email & Contact Extractor — Emails, Phones, Socials
Give a URL or list of domains — get back emails, phones and social profiles (LinkedIn, Twitter/X, Facebook, Instagram, YouTube, GitHub, TikTok, Pinterest) from the homepage and canonical contact pages. Decodes Cloudflare-obfuscated and text-obfuscated emails.
Pricing
from $1.20 / 1,000 results
Rating
0.0
(0)
Developer
Haketa
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Email & Contact Extractor | Emails, Phones, Socials from Any Website
Paste a list of company URLs or bare domains. Get back a clean dataset with every email address, phone number and social-profile link the actor can find on the homepage and the canonical contact pages — /contact, /about, /team, /impressum, /contatti, and the rest of the universal contact-page conventions.
Cloudflare-obfuscated emails (the most common protection on company sites) are decoded automatically. So are name [at] domain [dot] com and @ HTML-entity tricks. The output is normalised, deduplicated and ready to ship straight to your CRM, lead-gen workflow, recruiting pipeline or AI agent.
TL;DR — One Apify run, one row per website, all the contact data the site itself publishes. No login. No browser. No spreadsheet wrangling.
What you get
For each website you paste in, the actor returns one row with:
| Field | What |
|---|---|
rootDomain | Normalised domain (no www., no protocol) |
title | Homepage / contact-page title — useful for filtering |
emails | Array of every email found, deduplicated and lower-cased |
phones | Array of every phone number found, normalised |
linkedinUrl | Company LinkedIn page |
twitterUrl | Company Twitter / X handle |
facebookUrl | Company Facebook page |
instagramUrl | Company Instagram handle |
youtubeUrl | Company YouTube channel |
githubUrl | Company GitHub organisation |
tiktokUrl | Company TikTok |
pinterestUrl | Company Pinterest |
emailCount, phoneCount, socialCount | Counts for quick filtering |
pagesScraped | Which pages on the domain contributed data |
fetchedPagesCount | How many pages were successfully fetched |
errors | Errors per page if any (null when everything worked) |
inputUrl | The URL you gave us, for round-tripping |
scrapedAt | ISO timestamp |
The complete schema is in dataset_schema.json.
How the matching works
Emails
mailto:links — anything inside<a href="mailto:…">is captured.- Cloudflare email protection — every site that ticks the "Email Address Obfuscation" box in their Cloudflare dashboard puts emails behind a
data-cfemail="…"span. The actor decodes the hex / XOR cipher automatically, so[email protected]becomes the real address. - Plain-text regex — strict pattern that demands a real TLD (
.com,.io, etc., not.png). - Obfuscation patterns —
name [at] domain [dot] com,(at),@,_at_— all normalised before regex extraction. - False-positive filter — known decoys (
example.com,yourdomain.com,sentry.iotoken hashes), image-extension tails (@2x.png), version-string-looking numerics — all dropped.
Phone numbers
tel:links —<a href="tel:+1-415-…">extracted and normalised.- International / explicit patterns —
+CCprefixed digits in body text. - Noise control — bare 4-digit-only numbers ignored, capped at 20 phones per page to keep the column clean.
Socials
Per-platform regexes match the canonical profile URL shape, skipping share / intent / login links. For LinkedIn we accept /company/, /in/ and /school/; for Twitter we skip /intent/, /share?, /i/; for Facebook we skip /sharer/, /dialog/, /tr?; etc.
Multi-page coverage
Every input URL becomes a small queue of up to maxPagesPerDomain URLs: the input itself, then the canonical contact paths (/contact, /contact-us, /about, /about-us, /team, /company, /impressum, /imprint, /contatti, /contacto, /kontakt). Pages 404 silently and don't fail the row — we just move on and aggregate whatever we find.
Use cases
B2B sales prospecting
You bought a list of 5,000 SaaS companies from Apollo / Crunchbase but only have website URLs. Run the list through this actor, get emails + LinkedIn + phones for every domain, then push to your outbound tool.
Recruiter pipeline
Have a list of target companies for a hiring campaign? Run them through, get HR / talent / hiring-manager emails (anything published on /team or /about) plus the company LinkedIn for InMail follow-up.
Brand monitoring & PR
Maintain a database of journalists' / agencies' contact emails. Run their domains periodically to track changes — new staff appear on /team, contact emails change with company restructures.
Investor / VC sourcing
Have a list of portfolio companies' websites? Pull their emails, founders' socials and channel counts in one pass.
Marketing partnerships / affiliate programs
Find the right inbox at thousands of potential partners (partnerships@, bizdev@, affiliates@). The actor surfaces every email on the page; you filter for the role-keywords you care about.
Lead-list enrichment
Already have an existing lead list? Paste the URL column and you get an enriched table back with emails, phones and social handles per row. Pair it with the dataset's CSV export and you're 30 seconds away from a refreshed Sheets tab.
AI agents & workflows
This actor is a perfect input for downstream LLM-driven enrichment: deduplicate companies by domain, get their public contact surface, then let your agent decide which mailbox to send the outbound to.
Compliance / GDPR audit
Map a company's entire externally-visible contact surface in one run. Compare against your own internal records to find stale or orphaned mailboxes.
Inputs (full list)
The canonical definitions live in input_schema.json; here's the human summary.
startUrls(array) — URLs or bare domains. Bothstripe.comandhttps://stripe.com/aboutare accepted. Each entry becomes one output row.maxPagesPerDomain(integer) — How many pages on the website to fetch. Page 1 is always the URL you gave us; the rest come fromcontactPaths. Default6, minimum1(cheapest, just the input page), maximum30.contactPaths(array) — Which sub-paths to probe on each domain. Defaults cover EN + DE + IT + ES + DE for international portfolios.includeSocials(boolean) — Toggle social-link extraction. Defaulttrue.includePhones(boolean) — Toggle phone extraction. Defaulttrue.decodeObfuscatedEmails(boolean) — Toggle Cloudflare + text obfuscation decoding. Defaulttrue.maxConcurrency(integer) — How many websites to crawl in parallel. Default5, max20.requestDelay(integer) — Milliseconds between page fetches inside one website. Default800.proxyConfiguration(proxy) — Apify Proxy. Defaults to Apify Datacenter (rotated per request).
Example inputs
1. Enrich a list of 50 SaaS companies (default)
{"startUrls": [{ "url": "https://stripe.com" },{ "url": "https://ramp.com" },{ "url": "https://airtable.com" },{ "url": "https://notion.so" },{ "url": "https://figma.com" }]}
2. Bare domains, just the homepage (cheapest mode)
{"startUrls": [{ "url": "stripe.com" },{ "url": "ramp.com" }],"maxPagesPerDomain": 1}
3. Emails only, no socials / phones
{"startUrls": [{ "url": "https://anthropic.com" }],"includeSocials": false,"includePhones": false}
4. Deep contact-page coverage for a tricky DACH site
{"startUrls": [{ "url": "https://www.bosch.com" }],"maxPagesPerDomain": 12,"contactPaths": ["/de/contact","/en/contact","/impressum","/de/impressum","/en/legal-notice","/de/karriere","/en/careers","/about-bosch","/de/unternehmen","/contact-us"],"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
5. Big list, high concurrency
{"startUrls": [{ "url": "https://stripe.com" },{ "url": "https://ramp.com" },{ "url": "https://airtable.com" }],"maxConcurrency": 10,"requestDelay": 500}
Output sample
{"inputUrl": "https://stripe.com","rootDomain": "stripe.com","title": "Stripe | Financial Infrastructure to Grow Your Revenue","emails": ["press@stripe.com", "support@stripe.com"],"phones": ["+1 888 926 2289"],"linkedinUrl": "https://www.linkedin.com/company/stripe","twitterUrl": "https://twitter.com/stripe","facebookUrl": "https://www.facebook.com/StripeHQ","instagramUrl": null,"youtubeUrl": "https://www.youtube.com/@StripeDevs","githubUrl": "https://github.com/stripe","tiktokUrl": null,"pinterestUrl": null,"emailCount": 2,"phoneCount": 1,"socialCount": 5,"pagesScraped": ["https://stripe.com","https://stripe.com/contact","https://stripe.com/about"],"fetchedPagesCount": 3,"errors": null,"scrapedAt": "2026-05-31T18:14:02.000Z"}
Cost & throughput
This actor uses Apify's pay-per-event pricing. The exact tier is set on the Apify Store listing.
Throughput on the default config (maxPagesPerDomain: 6, maxConcurrency: 5, requestDelay: 800):
- ~5–8 websites per minute → 300–500 / hour.
- Drop
maxPagesPerDomainto1and raisemaxConcurrencyto10for ~30 sites/minute.
Each input URL touches up to 6 sub-pages, so a 1,000-URL run hits at most 6,000 requests. Light by HTTP standards; this is why the actor stays cheap.
How the technique stacks up
There are dozens of "email finder" tools out there. Most are closed-API SaaS with monthly subscriptions and per-credit pricing. Here's where this Apify actor positions itself:
- No subscription — pay only for the runs you trigger.
- Cloudflare email-protection decoder built in — most generic scrapers miss obfuscated emails; we decode every
data-cfemailspan automatically. - Multi-language contact-page paths — DE / IT / ES / EN by default, so EU runs aren't crippled by the English-only assumption.
- Per-request session rotation via Apify Proxy — surviving long lists without IP rate-limit drama is the difference between 95% success and 25%.
- 0-row runs exit cleanly, not as FAILED — your scheduled enrichment job stays green when a few of the input domains 404.
- All eight socials in one pass — LinkedIn, Twitter / X, Facebook, Instagram, YouTube, GitHub, TikTok, Pinterest — no second tool needed for that column.
Tips & troubleshooting
Q: Half my rows have emailCount: 0. What's wrong?
A: Three usual suspects:
- The domain's
/contactpage lives at a custom path (e.g./contact-us-en,/about/team). ExtendcontactPathswith the candidate you saw in your browser. - The site uses an aggressive bot wall (Cloudflare full challenge, PerimeterX). Switch the proxy group to
RESIDENTIAL— the row'serrorscolumn will tell you which page got blocked. - The site simply doesn't publish emails on its public pages. Some enterprise / private-equity sites do this on purpose; no scraper will fix that.
Q: I'm seeing too many "junk" emails (hash@sentry.io, you@example.com, …).
A: We filter the worst offenders, but new ones appear every week. Filter the emails column downstream by domain: keep only entries whose root-domain (@stripe.com) matches the row's rootDomain.
Q: Phones are noisy / wrong.
A: Phone extraction is intentionally conservative — we only pick tel: links plus clearly-international +CC … patterns. If your inputs are US-only sites you can disable includePhones and rely on a US-specific phone scraper downstream.
Q: I want one row per email, not per website.
A: Use Apify's "Transform dataset" integration with flat-ish JS — for (const e of item.emails) yield { ...item, email: e }. The aggregate shape is easier to reason about as the default but trivial to explode.
Q: How do I deduplicate when I run the same list weekly?
A: Use Apify Scheduler + the dataset's cleanItemCount ID = rootDomain. Or run a small transform: groupBy(rootDomain).
Q: How fresh is the data? A: Real-time. Every page is fetched live; there's no cache layer in the actor.
Q: Can I scrape a list of 100,000 URLs in one go? A: Yes, but break it into 5,000-URL batches per run for predictable cost and to keep the dataset payload reasonable. Schedule them daily via Apify Schedules.
Q: My proxy errors are persistent.
A: Two levers — (1) drop maxConcurrency to 2-3, (2) switch the proxy group to RESIDENTIAL. The actor automatically rotates the proxy session per request, but very heavy load on cheap datacenter IPs will eventually get throttled by Cloudflare.
Legal & ethical use
This actor reads public, indexable HTML — the same pages Google sees. Use the output responsibly:
- GDPR / CAN-SPAM apply to your usage, not to the act of scraping. If you cold-email EU contacts, you need a lawful basis (legitimate interest works for B2B if the email is professional, the topic is relevant, and you respect opt-outs). The actor does not bypass any login, paywall or robots.txt-disallowed path.
- Role addresses vs personal —
info@,support@,press@are explicitly company contact points and safe to use.firstname.lastname@are individuals; treat them per the rules above. - Don't spam. Don't scrape behind paywalls. Don't use this for credential stuffing or harassment. Apify will deactivate any account that does.
How this compares to SaaS email finders
| This actor | Hunter.io | Snov.io | Apollo.io | |
|---|---|---|---|---|
| Pricing | Pay-per-run, no subscription | $49–$499/mo | $39–$249/mo | $59–$149/mo |
| Per-result cost | Pennies | $0.10–$1.00/credit | $0.10–$0.30/credit | Bundled |
| Cloudflare email-protection decoder | ✅ Built in | ❌ | ❌ | ❌ |
| Multi-page crawl per domain | ✅ Up to 30 | Single page | Single page | Single page |
| Eight social profiles in one pass | ✅ | LinkedIn only | LinkedIn only | LinkedIn focus |
| DE / IT / ES / EN contact paths | ✅ Built in | English | English | English |
| Self-host / data ownership | ✅ Your Apify account | ❌ | ❌ | ❌ |
| Roll into your own pipeline | ✅ REST / webhook / SDK | API | API | API |
The trade-off: SaaS finders try to predict the email of a specific person (first.last@domain). This actor extracts every email the website actually publishes. If you need predicted emails for individuals, pair this with a verification tool. If you need real published mailboxes (PR, support, sales, partnerships) — this is what you want.
Industry-specific playbooks
B2B SaaS sales (outbound)
Run your TAM list through the actor, filter rows where the email's domain matches the row's rootDomain (drops noise like support@cloudflare.com from sites using CF). Concatenate linkedinUrl into your sequencer for combined email + LinkedIn touch.
Staffing agencies / executive search
Map a target list of companies to their linkedinUrl (for InMail) plus public emails (hr@, talent@, careers@). Then dedup against your existing reach-out CRM so you don't double-touch.
PR & media outreach
Pull press@, media@, pr@ mailboxes off a list of brand websites. The title column doubles as a quick brand-positioning hint before you draft the pitch.
Real-estate / property investors
Agency websites typically expose info@, sales@ plus a phone number. Run a postcode-filtered list of agent sites through this actor and you've built a regional outbound list in one batch.
Venture capital / corp dev
For every portfolio / target company, pull socials + emails + phones. The githubUrl is gold for technical-due-diligence — it surfaces public open-source activity that paid databases often miss.
Local SEO agencies
Pair this actor with a Google Maps scraper: feed agency website URLs from the Maps results into this actor, get back the emails and socials you couldn't see in the Maps card.
Common patterns we've seen
A few patterns that crop up frequently and how to handle them:
info@,hello@,contact@dominate the results. These are the universal role mailboxes — perfectly usable for B2B outreach, but tend to be triaged slowly. Pair them with the LinkedIn URL for a faster route to a human.- Agencies hide their team behind "Book a call" forms. When a row has
emailCount: 0butsocialCount: 5+and the page title looks polished, you've hit one of these. The LinkedIn URL is still the best outbound entry point. - DACH (DE / AT / CH) sites concentrate everything on
/impressum. Required by German law. Our defaultcontactPathsalready includes/impressumand/imprint. - Cloudflare-obfuscated emails appear as
[email protected]in raw HTML. Without the decoder you'd miss them entirely; with it on (the default), they're transparently captured. - Newer sites publish socials via
<link rel>tags in<head>. We scan raw HTML so these are captured too even when they're missing from anchor links.
Changelog
- 1.0 — Initial release. Email regex + mailto + Cloudflare-decoder + text-obfuscation patterns. Phones via
tel:and international patterns. Eight social profiles. Multi-page-per-domain crawl, concurrent workers, Apify Proxy.
Roadmap & feature requests
We read every Apify Store review and comment. High-priority candidates for v1.1+:
- Per-email row mode (one row per discovered email).
- Name extraction near emails (
John Smith — john@…). - Email pattern detection (
{first}.{last}@domain.com) for known-staff inference. - Configurable phone region (US-only / UK-only filter).
- Optional sitemap.xml expansion when the input domain has a sitemap.
- DNS-based deliverability check (MX / SMTP probe).
Drop a comment on the Store page if any of these would unblock you.