Website Lead Enricher avatar

Website Lead Enricher

Pricing

from $1.90 / 1,000 results

Go to Apify Store
Website Lead Enricher

Website Lead Enricher

Extract emails, phones, social profiles, and company data from any website. CRM-ready B2B lead enrichment with HubSpot, Salesforce, and Pipedrive export modes. Quality score, WHOIS lookup, and E.164 phone normalization included.

Pricing

from $1.90 / 1,000 results

Rating

0.0

(0)

Developer

RH Studios

RH Studios

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Turn any website into CRM-ready B2B leads — emails, phones, social profiles, company data, plus detected email naming conventions and per-domain bounce-risk scoring.

🚀 Try it on Apify Store → — runs in your browser, free tier included, no signup needed for the first batch.

🌐 See the visual pipeline → — interactive diagram + sample JSON/CSV output, no signup required.


Why this Actor?

  • 🎯 Stop guessing who's reachable — per-record isSendable flag + per-domain bounceRiskBucket so Instantly, Smartlead, and Apollo can filter out high-bounce-risk domains before you spend sending credits
  • 📧 5–10× the email coverage per domain — Email Pattern Finder detects first.last / flast / first conventions from existing emails, runs a single SMTP catch-all probe, and generates 2–10 predicted team emails per domain (or 20–200 when paired with Hunter.io)
  • 🔌 Drop-in HTTP API for agents and apps — Standby mode exposes /leads, /leads/{domain}, /stats, /health for AI agents, MCP integrations, and embedded B2B tools
  • 📊 CRM-ready exports — HubSpot / Salesforce / Pipedrive column shapes built in; import without mapping
  • 🤖 Heuristic, not AI — deterministic rules, no LLM cost, no external API keys, fully auditable
  • 🛡️ No silent failures — per-step error isolation: one bad step never kills the record; every step carries ok / error status + structured {code, message} on failure
  • Up to 1,000 URLs per run, ~5s/record, parallel processing up to 10 concurrent

What you get per record

Every input URL produces one record with these fields:

FieldTypeWhat it tells you
📧 Emailsstring[] classifiedCorporate vs. generic vs. invalid; throwaway domains filtered
📱 Phonesstring[] E.164Normalized for 50+ countries
🌐 SocialsobjectLinkedIn, Facebook, Instagram, X/Twitter, YouTube (validated, not generic pages)
🏢 CompanyobjectWHOIS registrant + registration date (opt-in)
📍 AddressobjectCity, postal code, country extracted from page text
Quality score0-100Per-record score with breakdown + missing_fields array
🏷️ Company typeenum14 verticals (saas, saas_b2b, agency, ecommerce, legal, medical, consulting, manufacturing, media, nonprofit, education, realestate, finance, other) with confidence
📨 isSendablebooleanSafe to mail? (see Outreach safety below)
🔍 emailPatternstringDetected naming convention: first.last, flast, first, etc. (or null)
🎯 bounceRiskBucketlow / medium / highPer-domain deliverability risk
📋 generatedEmailsarrayPredicted team emails with provenance tags (page-discovered, pattern-from-page, pattern-alternate)
📞 contactFormboolean + URLSame-domain <form> on /contact etc. (3rd-party form vendors excluded)
⚠️ scrapeErrorobject | nullMachine-readable failure code on hard errors
🛡️ pipelineData.steps[]arrayPer-step status + duration + error per record

Full schema: docs/NextSteps/EmailPatternFinder.md and .actor/dataset_schema.json.


Cost & performance

Batch sizeCompute units (typical)Wall-clock
100 URLs~5 CU~50s
1,000 URLs~50 CU~5–8 min

Free every run: heuristic extraction (no API cost). Pay only when you opt in: WHOIS lookups (~1s/URL), proxy bandwidth (DATACENTER ~$2.50/GB, RESIDENTIAL ~$12/GB).


Outreach safety

Two complementary signals tell you whether to mail a record:

1. Per-record: isSendable

isSendable: true only when all of the following hold:

  • A personal email (not no-reply@, noreply@, postmaster@)
  • The personal email's domain has valid MX (or A fallback) — 2s timeout
  • The domain is not a known spam-trap (mailinator, tempmail, guerrillamail)

Form-only records (no email, no phone) are flagged with isSendableReason: ["not_contactable"] so outreach tools can route them to a manual follow-up track instead of a campaign. Records with isSendable: true can be mapped straight to a campaign.

2. Per-domain: patternAnalysis.bounceRiskBucket

BucketMeans
lowDomain has MX, server rejects unknown recipients, pattern confidence clears the goal threshold. Safe to send.
mediumSMTP probe inconclusive OR catch-all with valid MX OR quick-outreach with low confidence. Test before blasting.
highDomain unreachable OR catch-all + no MX. Don't send.

Threshold tuned by the goal input:

goalbounceRiskBucket: "low" requiresOutreach strategy
quick-outreachisCatchAll: false AND mxValid AND patternConfidence >= 0.9single-shot — only the primary pattern
high-deliverability (default)isCatchAll: false AND mxValidfallback — try alternate if primary bounces
max-coverageany reachable domainprogressive — start strict, loosen based on response

The patternAnalysis.isCatchAll field is a tri-state (true / false / null) populated by a single-RCPT-TO SMTP probe on the domain's primary MX. Stampede-cached so concurrent calls for the same domain share one TCP socket. 1-second timeout; never blocks the step on unresponsive mail servers.

See docs/plans/IsSendable-implementation.md and docs/plans/EmailPatternFinder-implementation.md for the full algorithms.


How it works

  1. Submit up to 1,000 URLs per run (bare domains auto-prefixed with https://)
  2. Scrape each site with Cheerio-based HTML extraction (lightweight, no headless browser overhead), rotating user agents, and automatic retry with exponential backoff
  3. Validate & enrich — emails classified, phones normalized, socials verified, WHOIS looked up, email pattern detected, SMTP catch-all probed
  4. Export — one row per URL in the Apify Dataset, or download as a CSV ready for HubSpot, Salesforce, or Pipedrive

Note on JS-heavy sites: the production pipeline uses Cheerio + Axios only — no headless browser. Sites that render content client-side (React/Vue SPAs) will produce partial results. Pair with the optional proxyConfiguration to bypass anti-bot gates on protected sites. See the full pipeline below.

Pipeline at a glance

flowchart LR
A[URLs<br/>up to 1,000] --> B[Step 1: Scrape<br/>Cheerio + Axios]
B --> C[Step 2: Email<br/>Pattern Finder<br/>DNS + SMTP probe]
C --> D[Classify &<br/>Validate<br/>phones, socials,<br/>company type]
D --> E[Quality Score<br/>0-100]
E --> F[Export]
F --> F1[Apify Dataset<br/>one row per URL]
F --> F2[CRM-ready CSV<br/>HubSpot / Salesforce / Pipedrive]
F --> F3[Standby HTTP API<br/>/leads /leads/&#123;domain&#125; /stats]
F --> F4[KV Store<br/>runSummary]
classDef input fill:#1f2937,color:#fff,stroke:#0ea5e9,stroke-width:2px
classDef step fill:#0ea5e9,color:#fff,stroke:#0369a1
classDef output fill:#10b981,color:#fff,stroke:#047857
class A input
class B,C,D,E step
class F,F1,F2,F3,F4 output

A standalone, color-rendered version of this diagram is live at website-lead-enricher.netlify.app. The source is docs/pipeline-diagram.html — feel free to fork it for your own pipeline pages. Drop the live URL into your Apify Actor long description for a richer preview than plain markdown.


🔌 Live HTTP API (Standby mode)

Run this Actor in Apify Standby mode and it spins up a read-only HTTP API on the standby port — perfect for AI agents, MCP integrations, embedded B2B tools, and Zapier/n8n-style workflows where you want a stable queryable endpoint instead of one-shot batch runs.

EndpointReturns
GET /healthLiveness probe ({ status: "ok", uptimeMs })
GET /leadsPaginated list of enriched leads (max 1000 per page, supports ?limit= and ?offset=)
GET /leads/{domain}Single-lead lookup by domain — full record shape (same as one Dataset row)
GET /statsRun-level summary: stepErrors per pipeline step, droppedRecords, totalRecords, durationMs

CORS is open by default. The OpenAPI schema lives in .actor/openapi.json — import it into Postman, Insomnia, or any OpenAPI generator to scaffold a client in seconds.

# Get all sendable leads for a campaign import
# (Start the Actor in Standby mode from https://apify.com/operational_zirconia/website-lead-enricher first,
# then replace <your-standby-host> with the standby URL the Apify Console gives you.)
curl https://<your-standby-host>/leads?limit=500 | \
jq '.[] | select(.isSendable == true) | {email: .contacts.emails_corporate, domain, company: .company.name}'

The dataset is populated by previous normal (non-standby) runs; the standby server reads from the same Dataset and Key-Value Store. Pair a normal run with standby mode and the API stays queryable as long as the Actor is running.


CRM-ready export

Set csvMode in the input and get a file formatted exactly for your platform:

ModeBooleansUse case
standardtrue / falseGeneric CSV for custom tooling
hubspottrue / falseHubSpot Contact Import
salesforceTRUE / FALSESalesforce Lead Import Wizard
pipedrive1 / 0Pipedrive Person Import
{
"urls": ["https://acme.com", "https://stripe.com"],
"csvMode": "hubspot"
}

Output: OUTPUT_HUBSPOT_CSV in the Key-Value Store tab — import directly, no transformation.


Filter by company type

Each input URL is heuristically classified into one of 14 verticals (saas, saas_b2b, agency, ecommerce, legal, medical, consulting, manufacturing, media, nonprofit, education, realestate, finance, or other) using schema.org markup, meta description, and body-text keywords. Set companyTypes to keep only the verticals you care about.

{
"urls": ["https://acme.com", "https://bobslegal.com", "https://carsforkids.com"],
"companyTypes": ["saas", "consulting"]
}

Dropped records remain in the Apify Dataset with passedCompanyTypeFilter: false so you can audit them; they are removed from the local CSV/JSON export.


Email Pattern Finder in depth

Step 2 detects the company's email naming convention from the emails Step 1 found on the page, validates the domain with MX + a single SMTP catch-all probe, and emits a generatedEmails[] array plus a patternAnalysis block.

What's emitted

{
"emailPattern": "first.last",
"patternConfidence": 0.92,
"generatedEmails": [
{ "address": "jan.curry@acme.com", "name": "Jan Curry", "source": "page-discovered" },
{ "address": "ada.lovelace@acme.com", "name": "Ada Lovelace", "source": "pattern-from-page" },
{ "address": "curry.jan@acme.com", "name": "Jan Curry", "source": "pattern-alternate" }
],
"patternAnalysis": {
"mxValid": true,
"isCatchAll": false,
"emailCulture": "strict-format",
"sequenceStrategy": "fallback",
"bounceRiskBucket": "low"
}
}

source enum values

SourceMeaning
page-discoveredEmail Step 1 already found on the page that parses to a personal name
pattern-from-pageThe detected pattern applied to a contact name found on the page
pattern-alternateA backup pattern applied to the same names (when confidence is low)

See docs/NextSteps/EmailPatternFinder.md for the full spec, docs/plans/EmailPatternFinder-adr.md for the architecture decisions, and docs/plans/EmailPatternFinder-implementation.md for the build plan.


Input

{
"urls": ["https://site1.com", "stripe.com", "www.example.org/contact"],
"maxConcurrency": 5,
"includeWhois": false,
"csvMode": "standard",
"companyTypes": ["saas", "consulting"],
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["DATACENTER"]
},
"skipEmailPatternFinder": false,
"goal": "high-deliverability"
}
FieldDefaultDescription
urlsrequiredUp to 1,000 URLs or bare domains
maxConcurrency5Parallel requests (1–10). Use 1–2 for large batches
includeWhoisfalseAdds registrant name and registration date (~1s extra per URL)
csvModestandardstandard, hubspot, salesforce, or pipedrive
companyTypes[]Allow-list of verticals. Empty = include all.
proxyConfiguration{ useApifyProxy: false }Optional. Routes requests through Apify's proxy pool — see Proxy support
skipEmailPatternFinderfalseSkip Step 2 (Email Pattern Finder) — when true, no DNS / SMTP work is performed
searchWhoisfalseMine the WHOIS registrant email and add it to generatedEmails[] with source: "whois-registrant". No-op when skipEmailPatternFinder: true
goalhigh-deliverabilityOutreach intent. quick-outreach (strict, single-shot), high-deliverability (medium, fallback), max-coverage (loose, progressive)
hunterApiKeynullOptional Hunter.io API key. When set, pulls additional emails from Hunter's domain-search API into generatedEmails[] with source: "hunter-api". Free tier works. Failures populate patternAnalysis.hunterError without failing the step

Sample output

{
"url": "https://www.acme.com",
"domain": "acme.com",
"scrapedAt": "2026-06-21T10:00:00Z",
"contacts": {
"emails": [
{ "address": "jan@acme.com", "type": "corporate" },
{ "address": "contact@acme.com", "type": "generic" }
],
"phones": ["+12125551234"]
},
"socials": { "linkedin": "https://linkedin.com/company/acme" },
"qualityScore": { "total": 85, "breakdown": { "completeness": 80, "emailValidity": 100, "phoneValidity": 100, "socialPresence": 60 } },
"companyType": "saas",
"isSendable": true,
"emailPattern": "first.last",
"patternConfidence": 0.92,
"generatedEmails": [
{ "address": "jan.curry@acme.com", "name": "Jan Curry", "source": "page-discovered" }
],
"patternAnalysis": {
"mxValid": true,
"isCatchAll": false,
"bounceRiskBucket": "low",
"emailCulture": "strict-format"
},
"dataQuality": "medium",
"scrapeError": null
}

Full schema: .actor/dataset_schema.json.


Quality you can trust

  • No "nan" — null/NaN values become empty fields, never broken cells
  • UTF-8 BOM — accented company names import cleanly into Excel and every CRM
  • CSV injection guard (CWE-1236) — formula-triggering values (=, +, -, @) are quoted to prevent execution when the CSV is opened in Excel
  • Single homepage fetch — company name, socials, and address extracted from the same response; no wasteful re-scraping
  • WHOIS cache — duplicate domains in one run cost nothing
  • Graceful errors — failed URLs still appear in the dataset with error context, so nothing is lost silently

Proxy support

Hit a Cloudflare block? Scraping EU sites that geo-fence US IPs? Add proxyConfiguration to your input and the actor will route every request through the Apify-managed proxy pool. Default is off — you only pay proxy bandwidth when you opt in.

TierApify costBest forFailure mode
DATACENTER~$2.50/GBUS sites without aggressive anti-botBlocked by Cloudflare / Akamai
RESIDENTIAL~$12/GBAnti-bot sites, EU geo-targeting, compliance-sensitive leads4–5× the bandwidth cost

EU geo-pinning example:

{
"urls": ["https://acme.de", "https://example.de"],
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyCountry": "DE"
}
}

Every run that uses proxy prints a summary line at the end so you can track cost. If any URL hits a Cloudflare challenge, you'll see a tip suggesting you enable proxy.


Scrape errors

Every record carries a top-level scrapeError: object | null field. code is one of eight machine-readable categories:

CodeMeaningRetry?
timeoutRequest exceeded the timeout budget✅ Retry
blockedHTTP 403 or Cloudflare / bot-challenge signal⚠️ With proxy
dns_errorDNS lookup failed (ENOTFOUND / EAI_AGAIN)❌ Permanent
tls_errorCertificate / TLS handshake failed❌ Permanent
5xxUpstream 5xx response✅ Retry
4xxOther 4xx response (404, 429, …)⚠️ Depends
emptyFetch succeeded but no contact data extracted⚠️ Optional
unknownUnclassified failure⚠️ Case-by-case

Partial-success rule: if any path in the scrape loop yielded data, scrapeError is cleared to null. A record that got even one email from /contact succeeded.


Use cases

  • Sales prospecting — find decision-maker emails and direct phones for outbound campaigns
  • Cold outreach prep — build targeted lists with verified corporate emails and bounce-risk per domain
  • Lead enrichment — append real contact data to existing CRM records
  • Competitor research — map competitor digital presence at scale
  • Domain due diligence — WHOIS-backed company name and registration date for vendor research

Technical notes

  • Cheerio-based HTML extraction (lightweight, no headless browser overhead)
  • Automatic retry with exponential backoff
  • Rotating user agents to reduce blocks
  • Configurable timeout (15s default, 5s WHOIS, 1s SMTP probe)
  • Optional Apify proxy integration (DATACENTER / RESIDENTIAL / country pinning / custom URLs)
  • 951 tests across 49 test suites

Categories: Lead generation · Data scraping · Sales automation

Tags: email scraper, phone extractor, social media finder, B2B lead enrichment, CRM enrichment, contact discovery, WHOIS lookup, sales automation, proxy support, Cloudflare bypass, residential proxy, datacenter proxy, email pattern finder, bounce risk