Contact Details Extractor
Pricing
from $0.01 / 1,000 results
Contact Details Extractor
The cheapest contact scraper on Apify. Extract emails, phone numbers, company names, addresses & 25+ social profiles at $0.001/page - 50% less than competitors. Smart crawling auto-finds contact pages, bypasses Cloudflare protection, browser mode for JS sites, sitemap discovery.
Pricing
from $0.01 / 1,000 results
Rating
0.0
(0)
Developer
kata Kuri
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Extract emails, phone numbers, social media profiles, postal addresses, and company info from any website. Works on plain HTML, JavaScript-rendered SPAs, and Cloudflare-protected pages.
Built for sales teams, recruiters, and lead-gen workflows that need clean contact data ready to drop into a CRM.
Why this scraper
| This actor | |
|---|---|
| Cleans output | Validates emails (TLD whitelist, blacklist, multi-@ reject), phones (libphonenumber E.164), and social URLs (rejects share buttons) |
| Per-domain merging | One row per domain instead of one row per page |
| 25+ social platforms | LinkedIn, X/Twitter, Instagram, Facebook, YouTube, TikTok, Pinterest, GitHub, Discord, Telegram, WhatsApp, Reddit, Medium, Substack, Twitch, Snapchat, Threads, Bluesky, Mastodon, Spotify, Vimeo, Dribbble, Behance, SoundCloud, Crunchbase, AngelList |
| JS rendering on demand | Three modes: HTTP-only (cheapest), browser-only (always render), or auto (HTTP first, browser fallback when the page looks like an empty SPA shell) |
| Cloudflare email decoding | Decodes both data-cfemail attributes and /cdn-cgi/l/email-protection#hex URLs |
| Smart contact-page targeting | Crawl order is ranked by URL relevance — /contact, /about, /team, /imprint go first, blog posts last |
| Sitemap discovery | Optional /sitemap.xml and /robots.txt parsing to find contact-rich pages without crawling |
| Pay-per-event | You pay per record produced, not per page crawled — no charge when nothing useful is found |
What gets extracted
Per domain (when mergeContacts: true, default):
{"domain": "hubertprocess.com","url": "https://www.hubertprocess.com","companyName": "Hubert Process","companyDescription": "Hubert Process designs optical sorting machines…","logo": "https://www.hubertprocess.com/apple-touch-icon.png","emails": ["admin.si@hubertprocess.com", "contact@hubertprocess.com"],"phones": ["+33241487578", "+33243696298", "+41228203544"],"phonesUncertain": [],"addresses": [{"full": "1 Market St, San Francisco, CA, 94105, US","street": "1 Market St","city": "San Francisco","region": "CA","postalCode": "94105","country": "US"}],"linkedin": "https://www.linkedin.com/company/hubert-metal","twitter": null,"facebook": "https://www.facebook.com/Hubert-Process-Robotique-102014175737316","instagram": null,"youtube": null,"github": null,"// ... 19 other social platforms": "...","scrapedUrls": ["...12 URLs..."],"scrapedAt": "2026-05-03T12:34:34Z"}
Inputs
| Field | Type | Default | Description |
|---|---|---|---|
startUrls | array | required | Websites to scrape. Plain domains (example.com) and full URLs both work. |
maxPagesPerStartUrl | int | 20 | Pages crawled per website. Lower = cheaper, faster. |
maxDepth | int | 2 | Click-depth from the start URL. 1 = homepage only, 2 = homepage + linked pages. |
sameDomain | bool | true | Only follow links on the same registered domain. |
useSitemap | bool | false | Discover pages via /sitemap.xml. |
browserMode | enum | auto | off (HTTP only), on (always browser), auto (HTTP first, browser fallback for SPAs). |
mergeContacts | bool | true | Combine all pages of a domain into one record. |
extractAddresses | bool | true | Parse postal addresses from schema.org markup. |
extractCompanyInfo | bool | true | Detect company name, description, and logo. |
decodeCloudflareEmails | bool | true | Decode CF-protected emails. |
phoneCountryHint | string | null | ISO country code (US, GB, FR, …) for parsing local-format phones. |
maxConcurrency | int | 10 | Parallel page fetches. |
proxyConfiguration | object | {useApifyProxy: true} | Datacenter is the default; switch to RESIDENTIAL for Cloudflare-blocked sites. |
How it works
- Each start URL is normalized (
example.com→https://example.com) and seeded into the HTTP queue. - Optionally,
/sitemap.xmlis parsed; the highest-ranking URLs (containingcontact,about,imprint, etc.) are added to the queue. - The HTTP crawler (Cheerio) fetches pages with realistic headers. For each page:
- Run all extractors against the HTML and visible text.
- Extract outbound links, score them by contact-relevance, follow the highest-scoring ones until the per-domain budget is exhausted.
- In
browserMode: auto, if the HTML looks like an empty SPA shell (React root with no content, very low text-to-HTML ratio), push the URL into the browser queue instead.
- After the HTTP pass, the Playwright crawler renders the URLs that were flagged for fallback.
- Pages are merged per registered domain (so
blog.acme.co.ukandwww.acme.co.ukcollapse into oneacme.co.ukrecord). - Each non-empty record fires a
contact-recordcharge event (pay-per-event pricing).
What makes the extraction reliable
- Email TLD whitelist. A naive regex would match
contact@welko.contactezbecausecontactezlooks like a TLD. We reject TLDs not on the IANA root zone list. Result: zero false positives from non-English text. - Obfuscated email regex requires explicit markers.
[at],(at), or whitespace-isolatedAT— never bareatinside a word. Otherwiseautomationwould match asautom@ion. - Phone validation via libphonenumber. Phone-shaped digit runs only land in
phoneswhen libphonenumber confirms they're real. Unverified candidates with separators land inphonesUncertain. Pure digit runs (SIRET numbers, tracking IDs, hashes) are dropped. - Social URLs reject share buttons.
twitter.com/intent/tweet,linkedin.com/sharing/share-offsite,facebook.com/sharer.phpall rejected. Only profile URLs make it through. - Cloudflare email decoder. Both
data-cfemail="..."and/cdn-cgi/l/email-protection#...patterns are XOR-decoded inline.
Pricing
Pay-per-event — you're billed per successful page extracted, never for failed requests (4xx, timeouts, blocks). Exactly one of the page events fires per page, picked by which combination of renderer × proxy was used:
| Event | Suggested price (Free) | Suggested price (Business) | When it fires |
|---|---|---|---|
actor-start | $0.01 / run | $0.005 / run | Once at the start of every run |
page-scraped | $1.00 / 1 000 | $0.69 / 1 000 | Plain HTTP page extracted (cheapest) |
page-with-browser | $2.00 / 1 000 | $1.50 / 1 000 | Playwright-rendered page on datacenter proxy |
page-residential-proxy | $3.00 / 1 000 | $2.30 / 1 000 | Any page fetched via residential proxy (overrides the two above) |
These suggested prices match the competitor (betterdevsscrape/contact-details-extractor) so users can switch without re-budgeting.
What a typical run costs
Crawling 1 000 small B2B sites with default settings (maxPagesPerStartUrl: 20, browserMode: auto, datacenter proxy) typically uses:
- 1 ×
actor-start→ $0.01 - ~16 000 successful
page-scrapedevents (~80% HTTP success) → $16.00 - ~2 000
page-with-browserevents (~10% needed JS rendering) → $4.00 - Total: ~$20 per 1 000 sites — same ballpark as the competitor, with cleaner output.
If you switch to proxyConfiguration.apifyProxyGroups: ["RESIDENTIAL"] to bypass Cloudflare-protected sites:
- All page events become
page-residential-proxy→ ~$54 per 1 000 sites - Still cheaper than running residential through
betterdevsscrape($3 / 1 000 there too) and you get more sites unlocked thanks to per-context warm-up.
Why this model is better than per-domain billing
The previous version charged once per domain with at least one piece of data. That sounds cheap until you realise it heavily penalised small jobs (one site = same cost as 100 pages of one site) and made it impossible to set per-page budgets in tools like Make/n8n. The per-page model is what every other contact-extractor on the Apify Store uses and what your customer is already mentally budgeting against.
Tips
- Plain HTML sites (most B2B sites): keep
browserMode: off— fastest and cheapest. - JS-heavy SPAs (Webflow, modern React apps): use
browserMode: auto— it switches to browser only when needed. - Cloudflare-blocked sites (520, 403): switch
proxyConfigurationto{ "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }. - For sales/lead-gen: enable
useSitemap: trueand bumpmaxPagesPerStartUrl: 50— gets you the full team page on most company sites.
Output formats
The dataset is exportable as JSON, CSV, Excel, or HTML directly from the Apify console. CSV is the fastest path into HubSpot, Salesforce, Pipedrive, or any standard CRM importer.
Local development
git clone https://your-repo/contact-details-extractor.gitcd contact-details-extractornpm install# Run unit tests (31 cases — extractor logic, regex correctness, regression coverage)npm test# Run the actor locally against a test inputecho '{ "startUrls": [{"url": "https://www.apify.com"}], "maxPagesPerStartUrl": 8, "browserMode": "off", "proxyConfiguration": null }' > apify_storage/key_value_stores/default/INPUT.jsonAPIFY_LOCAL_STORAGE_DIR=$(pwd)/apify_storage node src/main.js
Roadmap
- Smarter address extraction from free-form text (currently relies on schema.org markup)
- Person-level contact extraction (job title + email pairing)
- Optional WhatsApp/Telegram deep-link extraction (
wa.me/<phone>patterns)
License
ISC