Contact Details Extractor avatar

Contact Details Extractor

Pricing

from $0.01 / 1,000 results

Go to Apify Store
Contact Details Extractor

Contact Details Extractor

The cheapest contact scraper on Apify. Extract emails, phone numbers, company names, addresses & 25+ social profiles at $0.001/page - 50% less than competitors. Smart crawling auto-finds contact pages, bypasses Cloudflare protection, browser mode for JS sites, sitemap discovery.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

kata Kuri

kata Kuri

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Extract emails, phone numbers, social media profiles, postal addresses, and company info from any website. Works on plain HTML, JavaScript-rendered SPAs, and Cloudflare-protected pages.

Built for sales teams, recruiters, and lead-gen workflows that need clean contact data ready to drop into a CRM.

Why this scraper

This actor
Cleans outputValidates emails (TLD whitelist, blacklist, multi-@ reject), phones (libphonenumber E.164), and social URLs (rejects share buttons)
Per-domain mergingOne row per domain instead of one row per page
25+ social platformsLinkedIn, X/Twitter, Instagram, Facebook, YouTube, TikTok, Pinterest, GitHub, Discord, Telegram, WhatsApp, Reddit, Medium, Substack, Twitch, Snapchat, Threads, Bluesky, Mastodon, Spotify, Vimeo, Dribbble, Behance, SoundCloud, Crunchbase, AngelList
JS rendering on demandThree modes: HTTP-only (cheapest), browser-only (always render), or auto (HTTP first, browser fallback when the page looks like an empty SPA shell)
Cloudflare email decodingDecodes both data-cfemail attributes and /cdn-cgi/l/email-protection#hex URLs
Smart contact-page targetingCrawl order is ranked by URL relevance — /contact, /about, /team, /imprint go first, blog posts last
Sitemap discoveryOptional /sitemap.xml and /robots.txt parsing to find contact-rich pages without crawling
Pay-per-eventYou pay per record produced, not per page crawled — no charge when nothing useful is found

What gets extracted

Per domain (when mergeContacts: true, default):

{
"domain": "hubertprocess.com",
"url": "https://www.hubertprocess.com",
"companyName": "Hubert Process",
"companyDescription": "Hubert Process designs optical sorting machines…",
"logo": "https://www.hubertprocess.com/apple-touch-icon.png",
"emails": ["admin.si@hubertprocess.com", "contact@hubertprocess.com"],
"phones": ["+33241487578", "+33243696298", "+41228203544"],
"phonesUncertain": [],
"addresses": [
{
"full": "1 Market St, San Francisco, CA, 94105, US",
"street": "1 Market St",
"city": "San Francisco",
"region": "CA",
"postalCode": "94105",
"country": "US"
}
],
"linkedin": "https://www.linkedin.com/company/hubert-metal",
"twitter": null,
"facebook": "https://www.facebook.com/Hubert-Process-Robotique-102014175737316",
"instagram": null,
"youtube": null,
"github": null,
"// ... 19 other social platforms": "...",
"scrapedUrls": ["...12 URLs..."],
"scrapedAt": "2026-05-03T12:34:34Z"
}

Inputs

FieldTypeDefaultDescription
startUrlsarrayrequiredWebsites to scrape. Plain domains (example.com) and full URLs both work.
maxPagesPerStartUrlint20Pages crawled per website. Lower = cheaper, faster.
maxDepthint2Click-depth from the start URL. 1 = homepage only, 2 = homepage + linked pages.
sameDomainbooltrueOnly follow links on the same registered domain.
useSitemapboolfalseDiscover pages via /sitemap.xml.
browserModeenumautooff (HTTP only), on (always browser), auto (HTTP first, browser fallback for SPAs).
mergeContactsbooltrueCombine all pages of a domain into one record.
extractAddressesbooltrueParse postal addresses from schema.org markup.
extractCompanyInfobooltrueDetect company name, description, and logo.
decodeCloudflareEmailsbooltrueDecode CF-protected emails.
phoneCountryHintstringnullISO country code (US, GB, FR, …) for parsing local-format phones.
maxConcurrencyint10Parallel page fetches.
proxyConfigurationobject{useApifyProxy: true}Datacenter is the default; switch to RESIDENTIAL for Cloudflare-blocked sites.

How it works

  1. Each start URL is normalized (example.comhttps://example.com) and seeded into the HTTP queue.
  2. Optionally, /sitemap.xml is parsed; the highest-ranking URLs (containing contact, about, imprint, etc.) are added to the queue.
  3. The HTTP crawler (Cheerio) fetches pages with realistic headers. For each page:
    • Run all extractors against the HTML and visible text.
    • Extract outbound links, score them by contact-relevance, follow the highest-scoring ones until the per-domain budget is exhausted.
    • In browserMode: auto, if the HTML looks like an empty SPA shell (React root with no content, very low text-to-HTML ratio), push the URL into the browser queue instead.
  4. After the HTTP pass, the Playwright crawler renders the URLs that were flagged for fallback.
  5. Pages are merged per registered domain (so blog.acme.co.uk and www.acme.co.uk collapse into one acme.co.uk record).
  6. Each non-empty record fires a contact-record charge event (pay-per-event pricing).

What makes the extraction reliable

  • Email TLD whitelist. A naive regex would match contact@welko.contactez because contactez looks like a TLD. We reject TLDs not on the IANA root zone list. Result: zero false positives from non-English text.
  • Obfuscated email regex requires explicit markers. [at], (at), or whitespace-isolated AT — never bare at inside a word. Otherwise automation would match as autom@ion.
  • Phone validation via libphonenumber. Phone-shaped digit runs only land in phones when libphonenumber confirms they're real. Unverified candidates with separators land in phonesUncertain. Pure digit runs (SIRET numbers, tracking IDs, hashes) are dropped.
  • Social URLs reject share buttons. twitter.com/intent/tweet, linkedin.com/sharing/share-offsite, facebook.com/sharer.php all rejected. Only profile URLs make it through.
  • Cloudflare email decoder. Both data-cfemail="..." and /cdn-cgi/l/email-protection#... patterns are XOR-decoded inline.

Pricing

Pay-per-event — you're billed per successful page extracted, never for failed requests (4xx, timeouts, blocks). Exactly one of the page events fires per page, picked by which combination of renderer × proxy was used:

EventSuggested price (Free)Suggested price (Business)When it fires
actor-start$0.01 / run$0.005 / runOnce at the start of every run
page-scraped$1.00 / 1 000$0.69 / 1 000Plain HTTP page extracted (cheapest)
page-with-browser$2.00 / 1 000$1.50 / 1 000Playwright-rendered page on datacenter proxy
page-residential-proxy$3.00 / 1 000$2.30 / 1 000Any page fetched via residential proxy (overrides the two above)

These suggested prices match the competitor (betterdevsscrape/contact-details-extractor) so users can switch without re-budgeting.

What a typical run costs

Crawling 1 000 small B2B sites with default settings (maxPagesPerStartUrl: 20, browserMode: auto, datacenter proxy) typically uses:

  • 1 × actor-start → $0.01
  • ~16 000 successful page-scraped events (~80% HTTP success) → $16.00
  • ~2 000 page-with-browser events (~10% needed JS rendering) → $4.00
  • Total: ~$20 per 1 000 sites — same ballpark as the competitor, with cleaner output.

If you switch to proxyConfiguration.apifyProxyGroups: ["RESIDENTIAL"] to bypass Cloudflare-protected sites:

  • All page events become page-residential-proxy → ~$54 per 1 000 sites
  • Still cheaper than running residential through betterdevsscrape ($3 / 1 000 there too) and you get more sites unlocked thanks to per-context warm-up.

Why this model is better than per-domain billing

The previous version charged once per domain with at least one piece of data. That sounds cheap until you realise it heavily penalised small jobs (one site = same cost as 100 pages of one site) and made it impossible to set per-page budgets in tools like Make/n8n. The per-page model is what every other contact-extractor on the Apify Store uses and what your customer is already mentally budgeting against.

Tips

  • Plain HTML sites (most B2B sites): keep browserMode: off — fastest and cheapest.
  • JS-heavy SPAs (Webflow, modern React apps): use browserMode: auto — it switches to browser only when needed.
  • Cloudflare-blocked sites (520, 403): switch proxyConfiguration to { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }.
  • For sales/lead-gen: enable useSitemap: true and bump maxPagesPerStartUrl: 50 — gets you the full team page on most company sites.

Output formats

The dataset is exportable as JSON, CSV, Excel, or HTML directly from the Apify console. CSV is the fastest path into HubSpot, Salesforce, Pipedrive, or any standard CRM importer.

Local development

git clone https://your-repo/contact-details-extractor.git
cd contact-details-extractor
npm install
# Run unit tests (31 cases — extractor logic, regex correctness, regression coverage)
npm test
# Run the actor locally against a test input
echo '{ "startUrls": [{"url": "https://www.apify.com"}], "maxPagesPerStartUrl": 8, "browserMode": "off", "proxyConfiguration": null }' > apify_storage/key_value_stores/default/INPUT.json
APIFY_LOCAL_STORAGE_DIR=$(pwd)/apify_storage node src/main.js

Roadmap

  • Smarter address extraction from free-form text (currently relies on schema.org markup)
  • Person-level contact extraction (job title + email pairing)
  • Optional WhatsApp/Telegram deep-link extraction (wa.me/<phone> patterns)

License

ISC