Verified B2B Lead Extractor - Emails, Phones & AI avatar

Verified B2B Lead Extractor - Emails, Phones & AI

Pricing

from $1.50 / 1,000 lead discovereds

Go to Apify Store
Verified B2B Lead Extractor - Emails, Phones & AI

Verified B2B Lead Extractor - Emails, Phones & AI

Verified B2B lead extractor for public websites: emails, phones, socials, tech stack, MX/SMTP checks, phone validation, and AI role classification. Turn domains or URLs into CRM-ready contacts with bill-safe caps. PPE, x402-ready, Skyfire bundle.

Pricing

from $1.50 / 1,000 lead discovereds

Rating

0.0

(0)

Developer

Nick

Nick

Maintained by Community

Actor stats

0

Bookmarked

34

Total users

23

Monthly active users

9.4 days

Issues response

3 days ago

Last modified

Share

Use this verified B2B lead extractor to turn domains or URLs into CRM-ready contact records: emails, phone numbers, social profiles, addresses, contact-page URLs, tech stack signals, optional MX/SMTP email verification, and AI role classification. It is built for fresh public website enrichment when you need source-visible data instead of another static lead database, with PPE pricing that is x402-ready and Skyfire-bundled for agent payments.

Best first runs

Cheapest smoke test:

{
"domains": ["python.org"],
"maxWebsites": 50,
"maxPagesPerSite": 3,
"includeSubpages": true,
"detectTechStack": false,
"verifyEmails": false
}

CRM enrichment test:

{
"domains": ["python.org"],
"maxWebsites": 50,
"maxPagesPerSite": 5,
"includeSubpages": true,
"detectTechStack": true,
"verifyEmails": false
}

Use this actor when you need fresh website contact data before outreach, enrichment, market mapping, or CRM cleanup. Start with a small run to confirm the target domain works, then turn on verifyEmails, deepEmailVerification, or enableAiAnalysis only after the basic run works.

What you get back

  • One row per submitted site with normalized url, discovered emails, phones_validated, social_profiles, address, contact_page_url, and tech_stack.
  • Optional emails_verified and ai_contact_analysis fields appear only when the paid verification or AI toggles are enabled. maxEmailsToVerify caps paid MX/SMTP checks across the run.
  • Good starter run: 5 pages per site, tech detection on, verification off. Then re-run only promising domains with MX/SMTP verification.

Example output shape:

{
"url": "https://example.com",
"domain": "example.com",
"emails": ["sales@example.com", "support@example.com"],
"phones_validated": [
{"raw": "+1 415 555 0100", "e164": "+14155550100", "country": "US", "type": "FIXED_LINE", "valid": true}
],
"social_profiles": {
"linkedin": "https://www.linkedin.com/company/example",
"github": "https://github.com/example"
},
"contact_page_url": "https://example.com/contact",
"tech_stack": ["Shopify", "Cloudflare", "Google Analytics"],
"scraped_at": "2026-05-10T18:15:00Z"
}

Best for: public website enrichment, B2B prospect lists, CRM cleanup, tech-stack segmentation, and contact-page discovery.

Not best for: private databases, logged-in pages, websites that intentionally hide contact details, or guaranteed personal decision-maker emails on every domain.

No cookies, no OAuth, no API keys (for the scraping) - runs on public data only. AI keys are optional and only needed when enableAiAnalysis: true.

$0.01 per contact. SMTP-verified mailboxes. CRM-ready in seconds. Apollo.io starts at $99/mo (7,200 credits/yr), Hunter.io at $34/mo (5,000 searches), ZoomInfo runs $15k+/yr with annual contracts. This Actor is pay-as-you-go - pay $0.01 per site enriched, $0.02 per definitively-verified mailbox, no seats, no contract, no monthly minimum. Run once for a 50-account ABM list, scale to 50,000 domains for a market map; the unit cost is identical.

What it does

Extract emails with real SMTP RCPT TO mailbox verification, libphonenumber-validated phone numbers (E.164 + country + line-type), 16 social media profiles, physical addresses, and 175+ tech stack signals from any public website. Give the Actor a list of domains or URLs and get structured, CRM-ready contact data back in seconds. Built for SDRs, AEs, RevOps, recruiters, ABM marketers, and M&A researchers who need fast, accurate website enrichment without an annual seat license.

For each domain you provide, the Actor fetches the homepage and up to 20 subpages (auto-prioritizing /contact, /about, /imprint), extracts emails from mailto: links, visible text, and JSON-LD structured data, validates phone numbers via libphonenumber, identifies 16 social profiles, parses physical addresses, detects 175+ tech stack signals, and optionally runs a real async SMTP RCPT TO probe and AI role classification. All output is structured JSON, immediately importable into HubSpot, Salesforce, Pipedrive, Outreach, Salesloft, Apollo, Google Sheets, Airtable, or any CRM via Apify's native integrations and webhooks.

Features

  • Run-health hardening - Each website is isolated behind a per-site wall-clock budget and broad exception guard, so one malformed page or slow target cannot crash the whole batch. Numeric and boolean inputs from API/CLI runs are parsed leniently and clamped to the documented schema bounds. If Apify exposes a run timeout deadline, the actor preserves a finalization buffer so already extracted results can be pushed before the platform timeout.
  • Deep SMTP RCPT TO mailbox probe (v1.7) - Opt-in deepEmailVerification: true layers a real async SMTP RCPT TO handshake on top of the MX tier. Every MX-cleared email gets classified as deliverable / undeliverable / catchall / greylisted / port_blocked / rate_limited / error, with the remote SMTP response code and message captured verbatim. The probe is reputation-safe: per-MX-host probe history persists across runs in a named key-value store, capped at 10 probes per host per UTC day with a 10-second cooldown between probes to the same host. Catch-all detection runs a second RCPT against a random non-existent local-part on the same connection and downgrades the verdict when the server accepts it. Charged $0.02 per definitive verdict (deliverable or undeliverable only) - all other verdicts are free. Requires verifyEmails: true.
  • Email deliverability verification (v1.6) - Opt-in verifyEmails: true runs a real DNS MX lookup on every extracted email's domain (via dnspython), detects common-domain typos via Levenshtein distance (gmial.com -> gmail.com, outllok.com -> outlook.com), flags free-inbox hosts (Gmail/Yahoo/Outlook/iCloud/ProtonMail) and disposable/burner mail (Mailinator/10minutemail/...), and tags every entry HIGH / MEDIUM / LOW / UNKNOWN deliverability. Emits a new emails_verified output field with mx_valid, mx_hosts, disposable, free_inbox, role_based, typo_suggestion, and a deliverability tier on every email. Charged $0.01 per MX-cleared email (typo / disposable / no-MX emails are free).
  • Phone validation (v1.5) - Extracts phone numbers in international formats, then runs every candidate through libphonenumber (via the phonenumbers Python port). Numbers that pass strict carrier-format validation are emitted as phones_validated with E.164, ISO country code, and line-type (MOBILE / FIXED_LINE / TOLL_FREE / VOIP / ...). Numbers that parse but fail regional format rules end up in phones_uncertain so you can triage them separately.
  • Social media profiles (16 platforms) - Identifies LinkedIn, Twitter/X, Facebook, Instagram, YouTube, GitHub, TikTok, Pinterest, Telegram, Discord, Threads, WhatsApp, Snapchat, Vimeo, Twitch, and Mastodon profiles linked from the website. (v1.9: +5 platforms - parity with vdrmota's 15+ claim, plus Mastodon.)
  • Physical address parsing - Locates business addresses from JSON-LD structured data, Schema.org markup, and HTML content.
  • Contact-page URL capture - Returns the canonical /contact or /kontakt page URL so you can deep-link prospects straight into the form.
  • Tech stack detection (175+ signals) - Identifies technologies across CMS platforms, JavaScript frameworks, CSS frameworks, analytics tools, chat widgets, CDNs, and e-commerce solutions.
  • Smart subpage crawling - Automatically visits /contact, /about, /imprint, and similar pages where contact information typically lives.
  • JSON-LD and Schema.org parsing - Extracts structured business data embedded in the page source that is invisible to casual browsing.
  • Domain input support - Enter bare domains like "example.com" without needing to add the protocol. HTTPS is added automatically.
  • Duplicate handling - Automatically deduplicates when the same URL appears in both urls and domains inputs.
  • Optional AI contact enrichment - When enabled (enableAiAnalysis), an LLM classifies each email by role (sales, support, hr, legal, executive, general, personal), groups near-duplicate team addresses (dedup_groups), flags non-monitored no-reply inboxes (deliverability_flags), picks the single best primary_contact for B2B outreach, and picks the single best primary_phone from the validated set (prefers MOBILE / FIXED_LINE / TOLL_FREE over PREMIUM_RATE / PAGER / UNKNOWN, prefers numbers whose country matches the domain geo). Supports OpenRouter, Anthropic, Google AI, OpenAI, or self-hosted Ollama.

What makes this different from other Apify contact scrapers

  • 16 social platforms vs 6 - LinkedIn (with company-profile-over-personal selection), Twitter/X, Facebook, Instagram, YouTube, GitHub, TikTok, Pinterest, Telegram, Discord, Threads, WhatsApp, Snapchat, Vimeo, Twitch, Mastodon. Competitors typically stop at the first six. (v1.9 expanded 11 -> 16.)
  • Carrier-verified phone split - phones_validated gives you E.164 + country + line-type per number via libphonenumber. Most Apify contact scrapers ship a single raw phones list and leave cleanup to you.
  • Real-time MX + SMTP deliverability - every email can be checked against real DNS MX records AND a real async SMTP RCPT TO probe against the primary MX host. Tagged HIGH / MEDIUM / LOW / UNKNOWN at the MX layer, then deliverable / undeliverable / catchall / greylisted / port_blocked at the mailbox layer. Typo suggestions, disposable-host detection, reputation-safe per-MX-host probe rate limits (10/day, 10s cooldown, persisted across runs). No other Apify contact scraper runs a live SMTP handshake on scraped emails.
  • 175+ tech-stack detectors on the same fetch - one run gets you contacts + CMS + analytics + chat + CRM + payment stacks. Competitors either do contacts or tech but rarely both.
  • Integrated AI enrichment - role classification, dedup, primary-contact + primary-phone pick, deliverability flags. No other contact-extractor on the Store bundles an LLM enrichment layer.
  • 5-provider AI choice - OpenRouter (default, cheapest), Anthropic, Google AI, OpenAI, or self-hosted Ollama for zero-egress setups.

Head-to-head: vs the dominant Apify contact scraper

This Actor competes directly against vdrmota/contact-info-scraper (the most-installed contact actor on the Apify Store). Where each wins:

FeatureThis Actor (harvestlab)vdrmota/contact-info-scraper
SMTP RCPT TO mailbox verificationYes - 4-tier (deliverable/undeliverable/catchall/greylisted) on every emailNo - third-party "lead enrichment" only
libphonenumber phone validationYes - E.164 + ISO country + line-type (MOBILE/FIXED_LINE/TOLL_FREE/...)No - raw phone list, no validation
Tech stack detectionYes - 175+ signals on the same fetch (CMS, frameworks, CDN, analytics, payment)No - not offered
AI email-role classifierYes - built-in (sales / support / hr / legal / executive / general / personal)No - not offered
Multi-LLM provider choiceYes - 5 providers (OpenRouter/Anthropic/Google/OpenAI/Ollama)No - no LLM layer
Social platforms16 (LinkedIn, X, FB, IG, YT, GitHub, TikTok, Pinterest, Telegram, Discord, Threads, WhatsApp, Snapchat, Vimeo, Twitch, Mastodon)15+
Per-result price$0.01 / verified contact$0.00105 / page
Rating on Apify StoreNew / unrated3.4/5 across 77 reviews

Why pay 10x more per result? Because $0.00105 x 1,000 unvalidated raw pages is not the same product as $0.01 x 1,000 SMTP-verified, libphonenumber-validated, AI-role-classified contacts. We deliver outreach-ready contacts; competitors deliver pages-with-text. Your CRM doesn't want pages - it wants verified mailboxes that won't bounce.

If your priority is raw-page volume and you'll do verification yourself downstream, vdrmota's actor is cheaper. If your priority is contacts that don't bounce on the first send, this Actor's $0.02 SMTP charge is the cheapest mailbox-verification primitive on the Apify Store.

Use Cases

SDR / AE Outbound Prospecting - replace Apollo & Hunter at the unit-cost level

Enrich prospect lists with SMTP-verified emails before outreach. A 1,000-domain prospect list costs roughly $10 + $20 deep-verify = $30 here vs ~$99/mo flat on Apollo Starter (and Apollo's data is bought, not freshly extracted from the site). Upload target company domains and get emails, libphonenumber-validated phones, LinkedIn / 16 socials, addresses, and 175+ tech-stack signals - drop straight into HubSpot, Salesforce, Pipedrive, Outreach, or Salesloft via Apify's native integrations. AI role-classifier flags info@ / support@ / noreply@ so you only blast decision-makers.

Account-Based Marketing (ABM) Lists - match-rate by domain, not name + company guess

Build enriched ABM target lists from a domain seed (e.g. "all SaaS companies running Segment + Marketo on Shopify"). Tech stack detection (175+ signals) lets you filter by CMS, analytics, CRM, chat widget, payment processor, CDN, search platform - Hunter and Apollo can't do this; ZoomInfo charges five figures/yr to come close. Combine with enableAiAnalysis for primary-contact selection per account.

Lead-Gen Agencies & RevOps - pay-per-event beats per-seat

Run hundreds of client enrichment jobs without buying $99-$1,200/mo seats per analyst. Pay only for sites that returned data: failed fetches and DNS errors are free. A 10,000-domain quarterly enrichment run costs ~$100 instead of a Hunter Business plan ($104/mo, 10k searches/mo) - and you keep the raw HTML provenance for every record.

Recruiter Sourcing & Talent Intel - find decision-makers via tech-stack match

Identify hiring managers at companies running specific stacks (e.g. "all React + Next.js + GraphQL shops in NYC"). Tech-stack detection reveals what languages and frameworks a company actually uses today, which is more current than LinkedIn skill-tags. Pair with social profile capture (GitHub, LinkedIn) to surface candidate-targeting signals before you pay LinkedIn Recruiter's $11k/yr seat.

M&A Due Diligence & Investor Research - rapid target screening

Run a watchlist of acquisition targets through the Actor every quarter. Output captures company description, address, contact emails, phone numbers, social footprint, and complete tech stack - letting M&A analysts and PE / VC associates screen 500-2,000 companies in an afternoon for tech debt, vendor lock-in, growth signals (e.g. "still on jQuery + Bootstrap", "migrated from Magento to Shopify Plus"), and digital maturity.

CRM Data Hygiene & Re-enrichment - close the 30% empty-field gap

Most B2B CRMs have ~30% missing or stale fields on free-tier records. Upload domains for any CRM segment with empty phone, email, linkedin_url, or tech_stack fields and bulk-update with fresh, SMTP-verified data. The dual phones + phones_validated shape lets you pick raw-extract vs carrier-verified per workflow. AI dedup groups info@ / hello@ / contact@ into a single primary contact per account.

Competitive Intelligence & Tech-Stack Tracking

Schedule regular scans on competitor domains. Detect when a target migrates CMS, swaps analytics, adds a CDP, or rolls out a new payment processor. The tech_stack array surfaces 175+ technologies - competitors that ship "tech detection" usually run 30-50 signals; this Actor catches CDP layers (Segment, Pendo, PostHog), product-analytics, customer-data infra, search platforms (Algolia, Typesense), build tools (Vite, Webpack), and feature-flag systems (LaunchDarkly).

Market Research & Industry Datasets

Build structured datasets of companies in a niche by feeding a domain list and exporting JSON / CSV / Excel. Combine tech stack adoption with contact data to segment markets by technology fingerprint, company digital maturity, and likely deal size - perfect for analyst reports, investor decks, and trend studies.

Input

ParameterTypeDefaultDescription
urlsarray[]List of full website URLs (e.g., https://example.com).
startUrls / url / websitearray/string-API/CLI aliases for URL inputs. startUrls is merged with urls; singular aliases are accepted for one-off calls.
domainsarray[]List of bare domains (e.g., example.com, stripe.com). HTTPS is added automatically.
maxPagesPerSiteinteger5Number of pages to crawl per site (1-20). More pages means more contact data found but longer runtime.
includeSubpagesbooleantrueAutomatically crawl /contact, /about, /imprint, and similar pages. Highly recommended - many sites only list contact info on subpages.
detectTechStackbooleantrueDetect CMS, frameworks, analytics, and 175+ other technologies. Set to false for faster runs when you only need contact data.
verifyEmailsbooleanfalseRun real DNS MX lookups on every extracted email, detect typos / disposable / free-inbox domains, and tag each email HIGH / MEDIUM / LOW / UNKNOWN deliverability. Adds the emails_verified output field. Cost: $0.01 per MX-cleared email.
deepEmailVerificationbooleanfalseOpt-in: real async SMTP RCPT TO probe against the primary MX host for every MX-cleared email. Classifies each as deliverable / undeliverable / catchall / greylisted / port_blocked / rate_limited. Reputation-safe: 10 probes/host/day cap + 10s cooldown, persisted across runs. Charges $0.02 per definitive verdict only. Requires verifyEmails: true.
enableAiAnalysisbooleanfalseEnable AI-powered contact enrichment - email role classification, team-address deduplication, primary-contact selection, and deliverability flags. Requires an API key for your chosen LLM provider.
llmProviderstringopenrouterAI provider when enrichment is enabled: openrouter, anthropic, google, openai, or ollama.
openrouterApiKey / anthropicApiKey / googleApiKey / openaiApiKeystring-API key for the chosen provider. Ollama (self-hosted) uses ollamaBaseUrl instead.
proxyConfigurationobject-Apify proxy settings for accessing geo-restricted or bot-protected sites.

You can provide input via urls, startUrls, url, website, domains, or any combination of them. Duplicates across inputs are removed automatically.

Pricing

This Actor uses pay-per-event (PPE) pricing and is x402-ready. It also ships a $5 Skyfire bundle for AI-agent payment flows that require a minimum charge.

EventPriceDescription
Lead discovered$0.0015Per website successfully scraped in basic mode (detectTechStack: false, verifyEmails: false, deepEmailVerification: false).
Contact extracted$0.01Per website successfully analyzed in enriched mode with tech stack, MX/SMTP email checks, phone validation, or AI-ready enrichment.
Verified phone extracted$0.02Charged once per libphonenumber-validated phone number (E.164 + country + line-type). Only fires on numbers that pass strict carrier-format validation.
Verified email checked$0.01Charged once per email whose domain passes real MX-record verification (HIGH / MEDIUM tier). Typo / disposable / no-MX emails (LOW tier) and UNKNOWN results are free. Only fires when verifyEmails is on.
Verified email deep-checked$0.02Charged once per email that receives a definitive SMTP RCPT verdict (deliverable or undeliverable). catchall, greylisted, rate_limited, port_blocked, and error verdicts are free. Only fires when deepEmailVerification is on.
AI analysis completed$0.05Per website when enableAiAnalysis is on (role classification, email dedup, primary-contact + primary-phone selection, deliverability flags).
Skyfire bundle$5.00skyfire-bundle-1000-leads, aligned with Skyfire's $5 minimum for agent-paid bulk runs.

Failed requests (unreachable sites, DNS errors) are not charged.

ScenarioWebsitesEstimated Cost
Basic discovery smoke100~$0.15 with detectTechStack: false
Quick test batch10~$0.10
Sales prospect list100~$1.00
Market research dataset500~$5.00
Large-scale enrichment2,000~$20.00

Plus Apify platform compute costs. A typical batch of 100 websites completes in 3-8 minutes depending on site responsiveness and pages crawled per site.

vs. commercial alternatives: Hunter.io charges $49+/mo and Apollo $49+/mo for contact data APIs with monthly minimums and seat-based pricing. This actor uses pay-per-event with no subscription: $0.01/contact and zero monthly fees.

Skyfire bulk bundle (AI-agent payment rail)

A skyfire-bundle-1000-leads event ships at $5.00 per 1,000 leads for AI agents paying via the Skyfire JWT rail. This is a 3.33x premium over the raw lead-discovered tier ($0.0015 -> $0.005/lead) - a deliberate convenience-floor design: Skyfire requires a $5 minimum charge per actor invocation, and the bundle aligns with that floor while shipping the full extractor stack (SMTP RCPT verify + libphonenumber + 175+ tech-stack detection + AI role classifier) under one prepaid call. Pay-as-you-go users via Apify's standard PPE rail still get the cheaper $0.0015/lead - Skyfire is opt-in for agents wanting payment-rail compatibility.

Use with AI agents

Contact-extractor returns structured contact dicts (emails, email_roles, primary_contact, phones_validated, tech_stack, social_profiles, verified flags) - the exact shape an outbound-outreach LLM agent needs to plan, qualify, and personalize. Pair it with the new Apify js-langchain and js-langgraph-agent templates: agents can iterate cheaply at $0.0015/lead via the lead-discovery tier, then verify the best targets at $0.01/contact with the full SMTP-verified pipeline.

LangChain - wrap as a Tool

from apify_client import ApifyClient
from langchain.tools import Tool
client = ApifyClient("YOUR_APIFY_TOKEN")
def find_b2b_contacts(start_urls: list[str]) -> list[dict]:
run = client.actor("harvestlab/contact-extractor").call(
run_input={"urls": start_urls, "verifyEmails": True, "enableAiAnalysis": True}
)
return list(client.dataset(run["defaultDatasetId"]).iterate_items())
contact_tool = Tool(
name="find_b2b_contacts",
description="Extract SMTP-verified B2B contacts (email, role, phone, tech stack) from website URLs.",
func=find_b2b_contacts,
)
# agent.invoke({"input": "Find decision-maker contacts at stripe.com and shopify.com"})
contact_tool.invoke({"startUrls": ["https://stripe.com", "https://shopify.com"]})

LangGraph - node in a sales-outreach StateGraph

from typing import TypedDict
from apify_client import ApifyClient
from langgraph.graph import StateGraph, END
class OutreachState(TypedDict):
domain: str
contacts: list[dict]
client = ApifyClient("YOUR_APIFY_TOKEN")
def enrich_contacts(state: OutreachState) -> OutreachState:
run = client.actor("harvestlab/contact-extractor").call(
run_input={"domains": [state["domain"]], "verifyEmails": True, "enableAiAnalysis": True}
)
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
verified = [e for item in items for e in item.get("emails_verified", []) if e.get("mx_valid")]
return {**state, "contacts": verified}
graph = StateGraph(OutreachState)
graph.add_node("enrich", enrich_contacts)
graph.set_entry_point("enrich")
graph.add_edge("enrich", END)
# graph.compile().invoke({"domain": "stripe.com", "contacts": []})

See Apify's actor-templates repo for the full js-langchain and js-langgraph-agent starters - drop this actor in as a tool and the agent handles ICP filtering, role-aware drafting, and follow-up sequencing.

Output

Each website produces a structured JSON object:

{
"url": "https://example.com",
"domain": "example.com",
"company_name": "Example Inc",
"company_description": "Example Inc is a technology company that helps businesses build better software solutions and streamline their operations.",
"emails": ["info@example.com", "sales@example.com"],
"emails_verified": [
{
"email": "info@example.com",
"domain": "example.com",
"mx_valid": true,
"mx_hosts": ["aspmx.l.google.com", "alt1.aspmx.l.google.com"],
"catchall": false,
"disposable": false,
"free_inbox": false,
"role_based": true,
"typo_suggestion": null,
"deliverability": "MEDIUM",
"deep_probe": true,
"probe_result": "deliverable",
"smtp_probe": {
"result": "deliverable",
"response_code": 250,
"response_text": "2.1.5 OK",
"mx_host": "aspmx.l.google.com",
"probed_at": "2026-04-23T03:15:22+00:00",
"is_catchall": false
}
},
{
"email": "sales@example.com",
"domain": "example.com",
"mx_valid": true,
"mx_hosts": ["aspmx.l.google.com", "alt1.aspmx.l.google.com"],
"catchall": null,
"disposable": false,
"free_inbox": false,
"role_based": true,
"typo_suggestion": null,
"deliverability": "MEDIUM",
"deep_probe": true,
"probe_result": "rate_limited",
"smtp_probe": {
"result": "rate_limited",
"response_code": null,
"response_text": "daily_cap_reached:10/10",
"mx_host": "aspmx.l.google.com",
"probed_at": "2026-04-23T03:15:22+00:00",
"is_catchall": null
}
}
],
"phones": ["+1 888 926 2289", "555-1234"],
"phones_validated": [
{
"raw": "+1 888 926 2289",
"e164": "+18889262289",
"country": "US",
"type": "TOLL_FREE",
"valid": true,
"possible": true
}
],
"phones_uncertain": ["555-1234"],
"social_profiles": {
"linkedin": "https://linkedin.com/company/example",
"twitter": "https://twitter.com/example",
"facebook": null,
"instagram": null,
"youtube": null,
"github": "https://github.com/example",
"tiktok": "https://tiktok.com/@example",
"pinterest": null,
"telegram": "https://t.me/examplecompany",
"discord": null,
"threads": "https://threads.net/@example"
},
"address": "123 Main St, City, State, 12345",
"contact_page_url": "https://example.com/contact",
"tech_stack": ["WordPress", "PHP", "nginx", "Google Analytics", "jQuery"],
"pages_crawled": 3,
"scraped_at": "2026-04-10T12:00:00Z"
}

Which phone field should you use? For carrier-verified dialing, cold-call campaigns, or CRM import, use phones_validated - every entry carries a canonical E.164 string, ISO-3166-1 country code, and line-type so your dialer knows whether it's mobile, toll-free, or a PBX fixed line. The legacy phones list (all raw extracts) and phones_uncertain (parsed but not carrier-format verified) are kept for backward compatibility and triage workflows.

Null values indicate that the Actor searched for the data but did not find it on the crawled pages. Output is available as JSON, CSV, or Excel. Use Apify integrations to push results directly to Google Sheets, HubSpot, Salesforce, Airtable, or any webhook endpoint.

When enableAiAnalysis is true, each output item additionally includes:

  • email_roles - per-email classification (sales, support, hr, legal, executive, general, personal)
  • primary_contact - the single best email for B2B outreach
  • primary_phone - the single best validated phone for B2B outreach (picked from phones_validated, prefers mobile / fixed-line / toll-free in the domain's country over premium-rate / pager / unknown)
  • dedup_groups - near-duplicate team addresses grouped together (e.g. info@ and hello@ pointing to the same inbox)
  • deliverability_flags - warnings for no-reply and non-monitored addresses

Technologies Detected

The Actor recognizes 175+ technologies across these categories:

  • CMS and Platforms - WordPress, Shopify, Wix, Squarespace, Drupal, Joomla, Ghost, Webflow, HubSpot
  • JavaScript Frameworks - React, Next.js, Vue.js, Nuxt.js, Angular, Svelte, jQuery
  • CSS Frameworks - Bootstrap, Tailwind CSS, Foundation, Bulma, Materialize
  • Analytics and Marketing - Google Analytics, Google Tag Manager, Facebook Pixel, Hotjar, Mixpanel, Segment, Plausible, Matomo, Microsoft Clarity, Amplitude, LinkedIn Insight Tag
  • Product Analytics - FullStory, LogRocket, PostHog, Pendo, Datadog RUM, Sentry, LaunchDarkly
  • Email Marketing - Mailchimp, SendGrid, Klaviyo, ActiveCampaign, Marketo, Pardot, ConvertKit
  • CRM and Sales - Salesforce, HubSpot, Pipedrive
  • Chat and Support - Intercom, Drift, Zendesk, Crisp, LiveChat, Tawk.to, Freshdesk
  • Authentication - Auth0, Okta, Firebase, Supabase
  • Search - Algolia, Elasticsearch, Typesense
  • Infrastructure - Cloudflare, Fastly, CloudFront, Vercel, Netlify, nginx, Apache, Akamai
  • E-commerce - WooCommerce, Magento, PrestaShop, BigCommerce, Stripe, PayPal
  • Video and Media - YouTube Embed, Vimeo, Wistia, Loom
  • Scheduling - Calendly, Acuity Scheduling
  • Social Proof - Trustpilot Widget, G2, Yotpo
  • Build Tools - Webpack, Vite, GraphQL, WebSocket

Quick Start

The simplest possible input - a list of domains:

{
"domains": ["stripe.com", "shopify.com", "hubspot.com"]
}

Or full URLs with custom crawl depth:

{
"urls": ["https://example.com", "https://anothersite.org"],
"maxPagesPerSite": 10,
"includeSubpages": true,
"detectTechStack": true
}

Enable SMTP-verified emails for CRM import:

{
"domains": ["stripe.com", "shopify.com"],
"verifyEmails": true,
"deepEmailVerification": true
}

Enable AI role classification and primary-contact selection:

{
"domains": ["stripe.com", "shopify.com"],
"verifyEmails": true,
"enableAiAnalysis": true,
"llmProvider": "openrouter",
"openrouterApiKey": "your-key-here"
}

MCP Quickstart - call this actor from Claude / Cursor / ChatGPT

Open Apify's hosted MCP configurator at mcp.apify.com, or install the Apify MCP server in your AI agent of choice:

# Claude Code
claude mcp add apify -- npx -y @apify/actors-mcp-server --token YOUR_APIFY_TOKEN
# Claude Desktop / Cursor (add to mcp.json):
{"mcpServers":{"apify":{"command":"npx","args":["-y","@apify/actors-mcp-server","--token","YOUR_APIFY_TOKEN"]}}}

Then prompt the agent:

"Use the harvestlab/contact-extractor actor on Apify to extract SMTP-verified contact emails for 50 SaaS companies in [list of domains] with primary-contact scoring and AI email-role classification. Push the results back as JSON."

Through Apify MCP, the agent will discover the actor's dataset_schema.json, generate the right input, run it, and pipe the typed output back into your conversation.

Troubleshooting

SMTP verification returns "greylisted" for most emails Greylisting is a deliberate delay tactic - mail servers temporarily reject unknown senders and accept on retry. The actor does not retry SMTP probes (that would be too slow). greylisted emails are real but the server wasn't conclusive; treat them as "probably valid, low confidence". Retry the same URL in 10-15 minutes for a more definitive result.

Many emails flagged as "catchall" or "undeliverable" Catchall domains accept all email addresses regardless of whether the mailbox exists (common with corporate domains using wildcard MX routing). These cannot be verified via SMTP RCPT - treat as "valid domain, unknown mailbox". Filter with deliverabilityFilter: "deliverable_only" to exclude them.

No tech stack signals returned Tech detection requires loading the page's HTML and JS assets. Pages behind login walls, heavy JavaScript SPAs, or sites that block headless fetching will return fewer tech signals. The actor detects 175+ technologies from publicly visible assets only.

AI role classification returns wrong roles The AI role classifier uses the email prefix and page context to infer roles (sales/marketing/engineering/executive/support). It cannot access LinkedIn or internal org data. For ambiguous prefixes (info@, hello@, contact@), it classifies as "unknown" - these are typically shared inboxes.

Rate limited or IP blocked mid-crawl The actor implements reputation-safe probe rate limits (max 10 SMTP probes per domain per UTC day, 10s cooldown). If a domain blocks the probe IP, the actor logs smtp_blocked: true and falls back to MX-only verification. Use a RESIDENTIAL proxy to rotate IPs between domains.

No emails found for a domain The site may not publish contact emails publicly. Try increasing maxPagesPerSite to crawl more pages, or check if the site uses a contact form instead of publishing email addresses. The actor only extracts emails that are publicly visible - it does not guess or generate addresses.

How many pages should I crawl per website? The default of 5 pages works well for most sites. The Actor prioritizes /contact, /about, and /imprint pages. Set maxPagesPerSite to 1 if you only need tech stack data and want the fastest possible run. Increase to 10-20 for large corporate sites where contact details may be spread across many sections.

Can I input just domain names without https://? Yes. Use the domains field with bare domain names like "stripe.com" or "hubspot.com". The Actor adds HTTPS automatically. You can also mix urls (full URLs) and domains (bare domains) in the same run.

How does email deliverability verification work? When you set verifyEmails: true, every extracted email runs through a four-step pipeline:

  1. Typo check - The email's domain is compared (bounded Levenshtein distance 1-2) against a dictionary of the 25 most-common consumer and business mail hosts. Near-misses like gmial.com, outllok.com, hotmial.com, iclould.com are flagged with a typo_suggestion and tagged LOW deliverability.
  2. Disposable / burner check - Known disposable hosts (Mailinator, 10MinuteMail, GuerrillaMail, YOPmail) are hard-flagged LOW regardless of MX state.
  3. Real MX lookup - The Actor runs an actual DNS MX query against the domain. Domains that have no MX record cannot receive mail and are tagged LOW.
  4. Tiering - Emails at free inboxes (Gmail / Yahoo / Outlook / iCloud / ProtonMail) OR with role prefixes (info@, hello@, support@, sales@) land in MEDIUM. Named addresses at non-free company domains with resolved MX land in HIGH.

How does the deep SMTP probe work? When verifyEmails and deepEmailVerification are both on, every MX-cleared email gets a real async SMTP RCPT TO handshake:

  1. EHLO with a neutral sender domain, MAIL FROM:<> (null sender), and RCPT TO:<target@domain>. No mail is delivered.
  2. Response code classification: 250/251 -> deliverable, 550/551/553/554 -> undeliverable, 421/450/451/452 -> greylisted.
  3. Catch-all test: a second RCPT against a random non-existent local-part on the same connection - if the server accepts it too, the verdict downgrades to catchall.
  4. QUIT - connection closes without sending mail.

Reputation-safety: the probe history persists across runs in a named Apify key-value store, hard-capped at 10 probes per MX host per UTC day with a 10-second cooldown.

How does phone validation work? Every phone candidate is passed through libphonenumber - the same library Google Android uses. Numbers that clear both is_possible_number and is_valid_number land in phones_validated with E.164, country code, and line-type. Numbers that parse but fail strict validation land in phones_uncertain. This is the same carrier-verified output shape that competitor services charge $8 per 1,000 sites for - here it's baked into the base $0.01 site price.

What if a website blocks the scraper? Some sites block datacenter IP addresses. Enable Apify proxy configuration with residential proxies to access these sites. Sites behind login walls or with aggressive bot protection may still be inaccessible. Failed requests are not charged.

Pair this with the rest of the portfolio

Workflow: discover target companies via gov-procurement-scraper or companies-house-scraper, then run contact-extractor over the resulting websites for SMTP-verified outbound contacts.


Scheduling and webhooks

Schedule weekly contact-refresh runs in Apify Console to keep your outbound pipeline current. Wire a webhookUrl in n8n or Make to push verified email/phone records with role classification and primary-contact scores directly into HubSpot, Pipedrive, or Lemlist the moment a run completes. Typical pipeline: KvK company list -> weekly Contact Extractor run -> n8n -> CRM contact creation + sequence enrollment.


This actor scrapes publicly available data. By using this actor, you agree to the following:

  • Your responsibility: You are solely responsible for ensuring your use complies with all applicable laws, regulations, and the target website's terms of service. This includes but is not limited to GDPR (EU), CCPA / CPRA (California), CAN-SPAM Act (US), CASL (Canada), PECR (UK), LGPD (Brazil), and other data protection / anti-spam laws in your jurisdiction.
  • No legal advice: This actor does not constitute legal advice. Consult a qualified attorney if you have questions about the legality of your specific use case.
  • Intended use: This actor is designed for legitimate business purposes such as market research, competitive analysis, and B2B lead generation using publicly accessible data.
  • Data handling: You are responsible for how you store, process, and share any data collected. Ensure you have a lawful basis (e.g. legitimate interest under GDPR Art. 6(1)(f), or pre-existing business relationship under CASL) for processing any personal data under applicable privacy laws.
  • CAN-SPAM compliance (US): Any commercial email you send using contacts from this actor must include a clear, conspicuous opt-out mechanism, a valid physical postal address, accurate "From" / "Reply-To" headers, non-deceptive subject lines, and you must honor unsubscribe requests within 10 business days. Penalties are up to $53,088 per violation under the FTC's enforcement schedule.
  • GDPR compliance (EU/UK): Even publicly available personal data is subject to GDPR. You must have a lawful basis (typically legitimate interest with documented LIA), respect Art. 14 transparency requirements (notify the data subject within 30 days when collecting from a third-party source), honor erasure / objection requests promptly, maintain records of processing under Art. 30, and not transfer EU data outside the EU/EEA without an SCC, adequacy decision, or other approved mechanism.
  • Anti-spam / outreach: Do NOT use this tool for unsolicited bulk messaging, spam, scraped-list email blasts, or list-building for resale. Use for permission-tested, targeted, B2B-relevant outreach only.
  • Rate limiting: This actor implements polite crawling practices including request delays and retry backoff to minimize impact on target servers.
  • No warranty: This actor is provided "as is" without warranty. Data accuracy depends on the target website's content and structure.
  • Personal data minimization: Implement data retention policies (typically 6-12 months for cold outreach contacts), encrypt PII at rest and in transit, restrict internal access on a need-to-know basis, and honor opt-out / Do-Not-Contact requests across your entire system, not just the channel that received the request.
  • SMTP probing: The SMTP RCPT TO probe uses reputation-safe rate limiting (10 probes/host/day, 10s cooldown) and never delivers mail. The null sender (MAIL FROM:<>) is the standard non-delivery bounce address used by mailer-daemons. Nevertheless, some mail server operators may log or flag probes - use responsibly. The probe never sends mail content, never authenticates, and runs only on domains you've explicitly listed in urls or domains.
  • Indeed Scraper - Pipe Indeed job listings into a recruiter sourcing pipeline: scrape hiring companies, then run Contact Extractor on the listing's company URL to get hiring-manager emails and phones for outbound recruiting.
  • News Monitor - Build account intelligence on prospects you've extracted contacts for. Track funding rounds, leadership changes, and product news so SDRs can time outreach around real-world events instead of cold-pitching.
  • Google Search Scraper - Discover target domains via SEO research before extracting contacts. Scrape SERPs for "best [category] software" or competitor intent queries, then feed the top-ranking domains into Contact Extractor for ICP-matched lead lists.