Email & Contact Extractor — Emails, Phones, Socials avatar

Email & Contact Extractor — Emails, Phones, Socials

Pricing

from $1.20 / 1,000 results

Go to Apify Store
Email & Contact Extractor — Emails, Phones, Socials

Email & Contact Extractor — Emails, Phones, Socials

Give a URL or list of domains — get back emails, phones and social profiles (LinkedIn, Twitter/X, Facebook, Instagram, YouTube, GitHub, TikTok, Pinterest) from the homepage and canonical contact pages. Decodes Cloudflare-obfuscated and text-obfuscated emails.

Pricing

from $1.20 / 1,000 results

Rating

0.0

(0)

Developer

Haketa

Haketa

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Email & Contact Extractor | Emails, Phones, Socials from Any Website

Paste a list of company URLs or bare domains. Get back a clean dataset with every email address, phone number and social-profile link the actor can find on the homepage and the canonical contact pages — /contact, /about, /team, /impressum, /contatti, and the rest of the universal contact-page conventions.

Cloudflare-obfuscated emails (the most common protection on company sites) are decoded automatically. So are name [at] domain [dot] com and @ HTML-entity tricks. The output is normalised, deduplicated and ready to ship straight to your CRM, lead-gen workflow, recruiting pipeline or AI agent.

TL;DR — One Apify run, one row per website, all the contact data the site itself publishes. No login. No browser. No spreadsheet wrangling.


What you get

For each website you paste in, the actor returns one row with:

FieldWhat
rootDomainNormalised domain (no www., no protocol)
titleHomepage / contact-page title — useful for filtering
emailsArray of every email found, deduplicated and lower-cased
phonesArray of every phone number found, normalised
linkedinUrlCompany LinkedIn page
twitterUrlCompany Twitter / X handle
facebookUrlCompany Facebook page
instagramUrlCompany Instagram handle
youtubeUrlCompany YouTube channel
githubUrlCompany GitHub organisation
tiktokUrlCompany TikTok
pinterestUrlCompany Pinterest
emailCount, phoneCount, socialCountCounts for quick filtering
pagesScrapedWhich pages on the domain contributed data
fetchedPagesCountHow many pages were successfully fetched
errorsErrors per page if any (null when everything worked)
inputUrlThe URL you gave us, for round-tripping
scrapedAtISO timestamp

The complete schema is in dataset_schema.json.


How the matching works

Emails

  • mailto: links — anything inside <a href="mailto:…"> is captured.
  • Cloudflare email protection — every site that ticks the "Email Address Obfuscation" box in their Cloudflare dashboard puts emails behind a data-cfemail="…" span. The actor decodes the hex / XOR cipher automatically, so [email&#160;protected] becomes the real address.
  • Plain-text regex — strict pattern that demands a real TLD (.com, .io, etc., not .png).
  • Obfuscation patternsname [at] domain [dot] com, (at), &#64;, _at_ — all normalised before regex extraction.
  • False-positive filter — known decoys (example.com, yourdomain.com, sentry.io token hashes), image-extension tails (@2x.png), version-string-looking numerics — all dropped.

Phone numbers

  • tel: links<a href="tel:+1-415-…"> extracted and normalised.
  • International / explicit patterns+CC prefixed digits in body text.
  • Noise control — bare 4-digit-only numbers ignored, capped at 20 phones per page to keep the column clean.

Socials

Per-platform regexes match the canonical profile URL shape, skipping share / intent / login links. For LinkedIn we accept /company/, /in/ and /school/; for Twitter we skip /intent/, /share?, /i/; for Facebook we skip /sharer/, /dialog/, /tr?; etc.

Multi-page coverage

Every input URL becomes a small queue of up to maxPagesPerDomain URLs: the input itself, then the canonical contact paths (/contact, /contact-us, /about, /about-us, /team, /company, /impressum, /imprint, /contatti, /contacto, /kontakt). Pages 404 silently and don't fail the row — we just move on and aggregate whatever we find.


Use cases

B2B sales prospecting

You bought a list of 5,000 SaaS companies from Apollo / Crunchbase but only have website URLs. Run the list through this actor, get emails + LinkedIn + phones for every domain, then push to your outbound tool.

Recruiter pipeline

Have a list of target companies for a hiring campaign? Run them through, get HR / talent / hiring-manager emails (anything published on /team or /about) plus the company LinkedIn for InMail follow-up.

Brand monitoring & PR

Maintain a database of journalists' / agencies' contact emails. Run their domains periodically to track changes — new staff appear on /team, contact emails change with company restructures.

Investor / VC sourcing

Have a list of portfolio companies' websites? Pull their emails, founders' socials and channel counts in one pass.

Marketing partnerships / affiliate programs

Find the right inbox at thousands of potential partners (partnerships@, bizdev@, affiliates@). The actor surfaces every email on the page; you filter for the role-keywords you care about.

Lead-list enrichment

Already have an existing lead list? Paste the URL column and you get an enriched table back with emails, phones and social handles per row. Pair it with the dataset's CSV export and you're 30 seconds away from a refreshed Sheets tab.

AI agents & workflows

This actor is a perfect input for downstream LLM-driven enrichment: deduplicate companies by domain, get their public contact surface, then let your agent decide which mailbox to send the outbound to.

Compliance / GDPR audit

Map a company's entire externally-visible contact surface in one run. Compare against your own internal records to find stale or orphaned mailboxes.


Inputs (full list)

The canonical definitions live in input_schema.json; here's the human summary.

  • startUrls (array) — URLs or bare domains. Both stripe.com and https://stripe.com/about are accepted. Each entry becomes one output row.
  • maxPagesPerDomain (integer) — How many pages on the website to fetch. Page 1 is always the URL you gave us; the rest come from contactPaths. Default 6, minimum 1 (cheapest, just the input page), maximum 30.
  • contactPaths (array) — Which sub-paths to probe on each domain. Defaults cover EN + DE + IT + ES + DE for international portfolios.
  • includeSocials (boolean) — Toggle social-link extraction. Default true.
  • includePhones (boolean) — Toggle phone extraction. Default true.
  • decodeObfuscatedEmails (boolean) — Toggle Cloudflare + text obfuscation decoding. Default true.
  • maxConcurrency (integer) — How many websites to crawl in parallel. Default 5, max 20.
  • requestDelay (integer) — Milliseconds between page fetches inside one website. Default 800.
  • proxyConfiguration (proxy) — Apify Proxy. Defaults to Apify Datacenter (rotated per request).

Example inputs

1. Enrich a list of 50 SaaS companies (default)

{
"startUrls": [
{ "url": "https://stripe.com" },
{ "url": "https://ramp.com" },
{ "url": "https://airtable.com" },
{ "url": "https://notion.so" },
{ "url": "https://figma.com" }
]
}

2. Bare domains, just the homepage (cheapest mode)

{
"startUrls": [
{ "url": "stripe.com" },
{ "url": "ramp.com" }
],
"maxPagesPerDomain": 1
}

3. Emails only, no socials / phones

{
"startUrls": [{ "url": "https://anthropic.com" }],
"includeSocials": false,
"includePhones": false
}

4. Deep contact-page coverage for a tricky DACH site

{
"startUrls": [{ "url": "https://www.bosch.com" }],
"maxPagesPerDomain": 12,
"contactPaths": [
"/de/contact",
"/en/contact",
"/impressum",
"/de/impressum",
"/en/legal-notice",
"/de/karriere",
"/en/careers",
"/about-bosch",
"/de/unternehmen",
"/contact-us"
],
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

5. Big list, high concurrency

{
"startUrls": [
{ "url": "https://stripe.com" },
{ "url": "https://ramp.com" },
{ "url": "https://airtable.com" }
],
"maxConcurrency": 10,
"requestDelay": 500
}

Output sample

{
"inputUrl": "https://stripe.com",
"rootDomain": "stripe.com",
"title": "Stripe | Financial Infrastructure to Grow Your Revenue",
"emails": ["press@stripe.com", "support@stripe.com"],
"phones": ["+1 888 926 2289"],
"linkedinUrl": "https://www.linkedin.com/company/stripe",
"twitterUrl": "https://twitter.com/stripe",
"facebookUrl": "https://www.facebook.com/StripeHQ",
"instagramUrl": null,
"youtubeUrl": "https://www.youtube.com/@StripeDevs",
"githubUrl": "https://github.com/stripe",
"tiktokUrl": null,
"pinterestUrl": null,
"emailCount": 2,
"phoneCount": 1,
"socialCount": 5,
"pagesScraped": [
"https://stripe.com",
"https://stripe.com/contact",
"https://stripe.com/about"
],
"fetchedPagesCount": 3,
"errors": null,
"scrapedAt": "2026-05-31T18:14:02.000Z"
}

Cost & throughput

This actor uses Apify's pay-per-event pricing. The exact tier is set on the Apify Store listing.

Throughput on the default config (maxPagesPerDomain: 6, maxConcurrency: 5, requestDelay: 800):

  • ~5–8 websites per minute → 300–500 / hour.
  • Drop maxPagesPerDomain to 1 and raise maxConcurrency to 10 for ~30 sites/minute.

Each input URL touches up to 6 sub-pages, so a 1,000-URL run hits at most 6,000 requests. Light by HTTP standards; this is why the actor stays cheap.


How the technique stacks up

There are dozens of "email finder" tools out there. Most are closed-API SaaS with monthly subscriptions and per-credit pricing. Here's where this Apify actor positions itself:

  • No subscription — pay only for the runs you trigger.
  • Cloudflare email-protection decoder built in — most generic scrapers miss obfuscated emails; we decode every data-cfemail span automatically.
  • Multi-language contact-page paths — DE / IT / ES / EN by default, so EU runs aren't crippled by the English-only assumption.
  • Per-request session rotation via Apify Proxy — surviving long lists without IP rate-limit drama is the difference between 95% success and 25%.
  • 0-row runs exit cleanly, not as FAILED — your scheduled enrichment job stays green when a few of the input domains 404.
  • All eight socials in one pass — LinkedIn, Twitter / X, Facebook, Instagram, YouTube, GitHub, TikTok, Pinterest — no second tool needed for that column.

Tips & troubleshooting

Q: Half my rows have emailCount: 0. What's wrong? A: Three usual suspects:

  1. The domain's /contact page lives at a custom path (e.g. /contact-us-en, /about/team). Extend contactPaths with the candidate you saw in your browser.
  2. The site uses an aggressive bot wall (Cloudflare full challenge, PerimeterX). Switch the proxy group to RESIDENTIAL — the row's errors column will tell you which page got blocked.
  3. The site simply doesn't publish emails on its public pages. Some enterprise / private-equity sites do this on purpose; no scraper will fix that.

Q: I'm seeing too many "junk" emails (hash@sentry.io, you@example.com, …). A: We filter the worst offenders, but new ones appear every week. Filter the emails column downstream by domain: keep only entries whose root-domain (@stripe.com) matches the row's rootDomain.

Q: Phones are noisy / wrong. A: Phone extraction is intentionally conservative — we only pick tel: links plus clearly-international +CC … patterns. If your inputs are US-only sites you can disable includePhones and rely on a US-specific phone scraper downstream.

Q: I want one row per email, not per website. A: Use Apify's "Transform dataset" integration with flat-ish JS — for (const e of item.emails) yield { ...item, email: e }. The aggregate shape is easier to reason about as the default but trivial to explode.

Q: How do I deduplicate when I run the same list weekly? A: Use Apify Scheduler + the dataset's cleanItemCount ID = rootDomain. Or run a small transform: groupBy(rootDomain).

Q: How fresh is the data? A: Real-time. Every page is fetched live; there's no cache layer in the actor.

Q: Can I scrape a list of 100,000 URLs in one go? A: Yes, but break it into 5,000-URL batches per run for predictable cost and to keep the dataset payload reasonable. Schedule them daily via Apify Schedules.

Q: My proxy errors are persistent. A: Two levers — (1) drop maxConcurrency to 2-3, (2) switch the proxy group to RESIDENTIAL. The actor automatically rotates the proxy session per request, but very heavy load on cheap datacenter IPs will eventually get throttled by Cloudflare.


This actor reads public, indexable HTML — the same pages Google sees. Use the output responsibly:

  • GDPR / CAN-SPAM apply to your usage, not to the act of scraping. If you cold-email EU contacts, you need a lawful basis (legitimate interest works for B2B if the email is professional, the topic is relevant, and you respect opt-outs). The actor does not bypass any login, paywall or robots.txt-disallowed path.
  • Role addresses vs personalinfo@, support@, press@ are explicitly company contact points and safe to use. firstname.lastname@ are individuals; treat them per the rules above.
  • Don't spam. Don't scrape behind paywalls. Don't use this for credential stuffing or harassment. Apify will deactivate any account that does.

How this compares to SaaS email finders

This actorHunter.ioSnov.ioApollo.io
PricingPay-per-run, no subscription$49–$499/mo$39–$249/mo$59–$149/mo
Per-result costPennies$0.10–$1.00/credit$0.10–$0.30/creditBundled
Cloudflare email-protection decoder✅ Built in
Multi-page crawl per domain✅ Up to 30Single pageSingle pageSingle page
Eight social profiles in one passLinkedIn onlyLinkedIn onlyLinkedIn focus
DE / IT / ES / EN contact paths✅ Built inEnglishEnglishEnglish
Self-host / data ownership✅ Your Apify account
Roll into your own pipeline✅ REST / webhook / SDKAPIAPIAPI

The trade-off: SaaS finders try to predict the email of a specific person (first.last@domain). This actor extracts every email the website actually publishes. If you need predicted emails for individuals, pair this with a verification tool. If you need real published mailboxes (PR, support, sales, partnerships) — this is what you want.


Industry-specific playbooks

B2B SaaS sales (outbound)

Run your TAM list through the actor, filter rows where the email's domain matches the row's rootDomain (drops noise like support@cloudflare.com from sites using CF). Concatenate linkedinUrl into your sequencer for combined email + LinkedIn touch.

Map a target list of companies to their linkedinUrl (for InMail) plus public emails (hr@, talent@, careers@). Then dedup against your existing reach-out CRM so you don't double-touch.

PR & media outreach

Pull press@, media@, pr@ mailboxes off a list of brand websites. The title column doubles as a quick brand-positioning hint before you draft the pitch.

Real-estate / property investors

Agency websites typically expose info@, sales@ plus a phone number. Run a postcode-filtered list of agent sites through this actor and you've built a regional outbound list in one batch.

Venture capital / corp dev

For every portfolio / target company, pull socials + emails + phones. The githubUrl is gold for technical-due-diligence — it surfaces public open-source activity that paid databases often miss.

Local SEO agencies

Pair this actor with a Google Maps scraper: feed agency website URLs from the Maps results into this actor, get back the emails and socials you couldn't see in the Maps card.


Common patterns we've seen

A few patterns that crop up frequently and how to handle them:

  • info@, hello@, contact@ dominate the results. These are the universal role mailboxes — perfectly usable for B2B outreach, but tend to be triaged slowly. Pair them with the LinkedIn URL for a faster route to a human.
  • Agencies hide their team behind "Book a call" forms. When a row has emailCount: 0 but socialCount: 5+ and the page title looks polished, you've hit one of these. The LinkedIn URL is still the best outbound entry point.
  • DACH (DE / AT / CH) sites concentrate everything on /impressum. Required by German law. Our default contactPaths already includes /impressum and /imprint.
  • Cloudflare-obfuscated emails appear as [email&#160;protected] in raw HTML. Without the decoder you'd miss them entirely; with it on (the default), they're transparently captured.
  • Newer sites publish socials via <link rel> tags in <head>. We scan raw HTML so these are captured too even when they're missing from anchor links.

Changelog

  • 1.0 — Initial release. Email regex + mailto + Cloudflare-decoder + text-obfuscation patterns. Phones via tel: and international patterns. Eight social profiles. Multi-page-per-domain crawl, concurrent workers, Apify Proxy.

Roadmap & feature requests

We read every Apify Store review and comment. High-priority candidates for v1.1+:

  • Per-email row mode (one row per discovered email).
  • Name extraction near emails (John Smith — john@…).
  • Email pattern detection ({first}.{last}@domain.com) for known-staff inference.
  • Configurable phone region (US-only / UK-only filter).
  • Optional sitemap.xml expansion when the input domain has a sitemap.
  • DNS-based deliverability check (MX / SMTP probe).

Drop a comment on the Store page if any of these would unblock you.