Website Lead Enricher
Pricing
from $1.90 / 1,000 results
Website Lead Enricher
Extract emails, phones, social profiles, and company data from any website. CRM-ready B2B lead enrichment with HubSpot, Salesforce, and Pipedrive export modes. Quality score, WHOIS lookup, and E.164 phone normalization included.
Pricing
from $1.90 / 1,000 results
Rating
0.0
(0)
Developer
RH Studios
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Turn any website into CRM-ready B2B leads — emails, phones, social profiles, company data, plus detected email naming conventions and per-domain bounce-risk scoring.
🚀 Try it on Apify Store → — runs in your browser, free tier included, no signup needed for the first batch.
🌐 See the visual pipeline → — interactive diagram + sample JSON/CSV output, no signup required.
Why this Actor?
- 🎯 Stop guessing who's reachable — per-record
isSendableflag + per-domainbounceRiskBucketso Instantly, Smartlead, and Apollo can filter out high-bounce-risk domains before you spend sending credits - 📧 5–10× the email coverage per domain — Email Pattern Finder detects
first.last/flast/firstconventions from existing emails, runs a single SMTP catch-all probe, and generates 2–10 predicted team emails per domain (or 20–200 when paired with Hunter.io) - 🔌 Drop-in HTTP API for agents and apps — Standby mode exposes
/leads,/leads/{domain},/stats,/healthfor AI agents, MCP integrations, and embedded B2B tools - 📊 CRM-ready exports — HubSpot / Salesforce / Pipedrive column shapes built in; import without mapping
- 🤖 Heuristic, not AI — deterministic rules, no LLM cost, no external API keys, fully auditable
- 🛡️ No silent failures — per-step error isolation: one bad step never kills the record; every step carries
ok/errorstatus + structured{code, message}on failure - ⚡ Up to 1,000 URLs per run, ~5s/record, parallel processing up to 10 concurrent
What you get per record
Every input URL produces one record with these fields:
| Field | Type | What it tells you |
|---|---|---|
| 📧 Emails | string[] classified | Corporate vs. generic vs. invalid; throwaway domains filtered |
| 📱 Phones | string[] E.164 | Normalized for 50+ countries |
| 🌐 Socials | object | LinkedIn, Facebook, Instagram, X/Twitter, YouTube (validated, not generic pages) |
| 🏢 Company | object | WHOIS registrant + registration date (opt-in) |
| 📍 Address | object | City, postal code, country extracted from page text |
| ⭐ Quality score | 0-100 | Per-record score with breakdown + missing_fields array |
| 🏷️ Company type | enum | 14 verticals (saas, saas_b2b, agency, ecommerce, legal, medical, consulting, manufacturing, media, nonprofit, education, realestate, finance, other) with confidence |
| 📨 isSendable | boolean | Safe to mail? (see Outreach safety below) |
| 🔍 emailPattern | string | Detected naming convention: first.last, flast, first, etc. (or null) |
| 🎯 bounceRiskBucket | low / medium / high | Per-domain deliverability risk |
| 📋 generatedEmails | array | Predicted team emails with provenance tags (page-discovered, pattern-from-page, pattern-alternate) |
| 📞 contactForm | boolean + URL | Same-domain <form> on /contact etc. (3rd-party form vendors excluded) |
| ⚠️ scrapeError | object | null | Machine-readable failure code on hard errors |
| 🛡️ pipelineData.steps[] | array | Per-step status + duration + error per record |
Full schema: docs/NextSteps/EmailPatternFinder.md and .actor/dataset_schema.json.
Cost & performance
| Batch size | Compute units (typical) | Wall-clock |
|---|---|---|
| 100 URLs | ~5 CU | ~50s |
| 1,000 URLs | ~50 CU | ~5–8 min |
Free every run: heuristic extraction (no API cost). Pay only when you opt in: WHOIS lookups (~1s/URL), proxy bandwidth (DATACENTER ~$2.50/GB, RESIDENTIAL ~$12/GB).
Outreach safety
Two complementary signals tell you whether to mail a record:
1. Per-record: isSendable
isSendable: true only when all of the following hold:
- A personal email (not
no-reply@,noreply@,postmaster@) - The personal email's domain has valid MX (or A fallback) — 2s timeout
- The domain is not a known spam-trap (mailinator, tempmail, guerrillamail)
Form-only records (no email, no phone) are flagged with isSendableReason: ["not_contactable"] so outreach tools can route them to a manual follow-up track instead of a campaign. Records with isSendable: true can be mapped straight to a campaign.
2. Per-domain: patternAnalysis.bounceRiskBucket
| Bucket | Means |
|---|---|
low | Domain has MX, server rejects unknown recipients, pattern confidence clears the goal threshold. Safe to send. |
medium | SMTP probe inconclusive OR catch-all with valid MX OR quick-outreach with low confidence. Test before blasting. |
high | Domain unreachable OR catch-all + no MX. Don't send. |
Threshold tuned by the goal input:
goal | bounceRiskBucket: "low" requires | Outreach strategy |
|---|---|---|
quick-outreach | isCatchAll: false AND mxValid AND patternConfidence >= 0.9 | single-shot — only the primary pattern |
high-deliverability (default) | isCatchAll: false AND mxValid | fallback — try alternate if primary bounces |
max-coverage | any reachable domain | progressive — start strict, loosen based on response |
The patternAnalysis.isCatchAll field is a tri-state (true / false / null) populated by a single-RCPT-TO SMTP probe on the domain's primary MX. Stampede-cached so concurrent calls for the same domain share one TCP socket. 1-second timeout; never blocks the step on unresponsive mail servers.
See docs/plans/IsSendable-implementation.md and docs/plans/EmailPatternFinder-implementation.md for the full algorithms.
How it works
- Submit up to 1,000 URLs per run (bare domains auto-prefixed with
https://) - Scrape each site with Cheerio-based HTML extraction (lightweight, no headless browser overhead), rotating user agents, and automatic retry with exponential backoff
- Validate & enrich — emails classified, phones normalized, socials verified, WHOIS looked up, email pattern detected, SMTP catch-all probed
- Export — one row per URL in the Apify Dataset, or download as a CSV ready for HubSpot, Salesforce, or Pipedrive
Note on JS-heavy sites: the production pipeline uses Cheerio + Axios only — no headless browser. Sites that render content client-side (React/Vue SPAs) will produce partial results. Pair with the optional
proxyConfigurationto bypass anti-bot gates on protected sites. See the full pipeline below.
Pipeline at a glance
flowchart LRA[URLs<br/>up to 1,000] --> B[Step 1: Scrape<br/>Cheerio + Axios]B --> C[Step 2: Email<br/>Pattern Finder<br/>DNS + SMTP probe]C --> D[Classify &<br/>Validate<br/>phones, socials,<br/>company type]D --> E[Quality Score<br/>0-100]E --> F[Export]F --> F1[Apify Dataset<br/>one row per URL]F --> F2[CRM-ready CSV<br/>HubSpot / Salesforce / Pipedrive]F --> F3[Standby HTTP API<br/>/leads /leads/{domain} /stats]F --> F4[KV Store<br/>runSummary]classDef input fill:#1f2937,color:#fff,stroke:#0ea5e9,stroke-width:2pxclassDef step fill:#0ea5e9,color:#fff,stroke:#0369a1classDef output fill:#10b981,color:#fff,stroke:#047857class A inputclass B,C,D,E stepclass F,F1,F2,F3,F4 output
A standalone, color-rendered version of this diagram is live at website-lead-enricher.netlify.app. The source is docs/pipeline-diagram.html — feel free to fork it for your own pipeline pages. Drop the live URL into your Apify Actor long description for a richer preview than plain markdown.
🔌 Live HTTP API (Standby mode)
Run this Actor in Apify Standby mode and it spins up a read-only HTTP API on the standby port — perfect for AI agents, MCP integrations, embedded B2B tools, and Zapier/n8n-style workflows where you want a stable queryable endpoint instead of one-shot batch runs.
| Endpoint | Returns |
|---|---|
GET /health | Liveness probe ({ status: "ok", uptimeMs }) |
GET /leads | Paginated list of enriched leads (max 1000 per page, supports ?limit= and ?offset=) |
GET /leads/{domain} | Single-lead lookup by domain — full record shape (same as one Dataset row) |
GET /stats | Run-level summary: stepErrors per pipeline step, droppedRecords, totalRecords, durationMs |
CORS is open by default. The OpenAPI schema lives in .actor/openapi.json — import it into Postman, Insomnia, or any OpenAPI generator to scaffold a client in seconds.
# Get all sendable leads for a campaign import# (Start the Actor in Standby mode from https://apify.com/operational_zirconia/website-lead-enricher first,# then replace <your-standby-host> with the standby URL the Apify Console gives you.)curl https://<your-standby-host>/leads?limit=500 | \jq '.[] | select(.isSendable == true) | {email: .contacts.emails_corporate, domain, company: .company.name}'
The dataset is populated by previous normal (non-standby) runs; the standby server reads from the same Dataset and Key-Value Store. Pair a normal run with standby mode and the API stays queryable as long as the Actor is running.
CRM-ready export
Set csvMode in the input and get a file formatted exactly for your platform:
| Mode | Booleans | Use case |
|---|---|---|
standard | true / false | Generic CSV for custom tooling |
hubspot | true / false | HubSpot Contact Import |
salesforce | TRUE / FALSE | Salesforce Lead Import Wizard |
pipedrive | 1 / 0 | Pipedrive Person Import |
{"urls": ["https://acme.com", "https://stripe.com"],"csvMode": "hubspot"}
Output: OUTPUT_HUBSPOT_CSV in the Key-Value Store tab — import directly, no transformation.
Filter by company type
Each input URL is heuristically classified into one of 14 verticals (saas, saas_b2b, agency, ecommerce, legal, medical, consulting, manufacturing, media, nonprofit, education, realestate, finance, or other) using schema.org markup, meta description, and body-text keywords. Set companyTypes to keep only the verticals you care about.
{"urls": ["https://acme.com", "https://bobslegal.com", "https://carsforkids.com"],"companyTypes": ["saas", "consulting"]}
Dropped records remain in the Apify Dataset with passedCompanyTypeFilter: false so you can audit them; they are removed from the local CSV/JSON export.
Email Pattern Finder in depth
Step 2 detects the company's email naming convention from the emails Step 1 found on the page, validates the domain with MX + a single SMTP catch-all probe, and emits a generatedEmails[] array plus a patternAnalysis block.
What's emitted
{"emailPattern": "first.last","patternConfidence": 0.92,"generatedEmails": [{ "address": "jan.curry@acme.com", "name": "Jan Curry", "source": "page-discovered" },{ "address": "ada.lovelace@acme.com", "name": "Ada Lovelace", "source": "pattern-from-page" },{ "address": "curry.jan@acme.com", "name": "Jan Curry", "source": "pattern-alternate" }],"patternAnalysis": {"mxValid": true,"isCatchAll": false,"emailCulture": "strict-format","sequenceStrategy": "fallback","bounceRiskBucket": "low"}}
source enum values
| Source | Meaning |
|---|---|
page-discovered | Email Step 1 already found on the page that parses to a personal name |
pattern-from-page | The detected pattern applied to a contact name found on the page |
pattern-alternate | A backup pattern applied to the same names (when confidence is low) |
See docs/NextSteps/EmailPatternFinder.md for the full spec, docs/plans/EmailPatternFinder-adr.md for the architecture decisions, and docs/plans/EmailPatternFinder-implementation.md for the build plan.
Input
{"urls": ["https://site1.com", "stripe.com", "www.example.org/contact"],"maxConcurrency": 5,"includeWhois": false,"csvMode": "standard","companyTypes": ["saas", "consulting"],"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["DATACENTER"]},"skipEmailPatternFinder": false,"goal": "high-deliverability"}
| Field | Default | Description |
|---|---|---|
urls | required | Up to 1,000 URLs or bare domains |
maxConcurrency | 5 | Parallel requests (1–10). Use 1–2 for large batches |
includeWhois | false | Adds registrant name and registration date (~1s extra per URL) |
csvMode | standard | standard, hubspot, salesforce, or pipedrive |
companyTypes | [] | Allow-list of verticals. Empty = include all. |
proxyConfiguration | { useApifyProxy: false } | Optional. Routes requests through Apify's proxy pool — see Proxy support |
skipEmailPatternFinder | false | Skip Step 2 (Email Pattern Finder) — when true, no DNS / SMTP work is performed |
searchWhois | false | Mine the WHOIS registrant email and add it to generatedEmails[] with source: "whois-registrant". No-op when skipEmailPatternFinder: true |
goal | high-deliverability | Outreach intent. quick-outreach (strict, single-shot), high-deliverability (medium, fallback), max-coverage (loose, progressive) |
hunterApiKey | null | Optional Hunter.io API key. When set, pulls additional emails from Hunter's domain-search API into generatedEmails[] with source: "hunter-api". Free tier works. Failures populate patternAnalysis.hunterError without failing the step |
Sample output
{"url": "https://www.acme.com","domain": "acme.com","scrapedAt": "2026-06-21T10:00:00Z","contacts": {"emails": [{ "address": "jan@acme.com", "type": "corporate" },{ "address": "contact@acme.com", "type": "generic" }],"phones": ["+12125551234"]},"socials": { "linkedin": "https://linkedin.com/company/acme" },"qualityScore": { "total": 85, "breakdown": { "completeness": 80, "emailValidity": 100, "phoneValidity": 100, "socialPresence": 60 } },"companyType": "saas","isSendable": true,"emailPattern": "first.last","patternConfidence": 0.92,"generatedEmails": [{ "address": "jan.curry@acme.com", "name": "Jan Curry", "source": "page-discovered" }],"patternAnalysis": {"mxValid": true,"isCatchAll": false,"bounceRiskBucket": "low","emailCulture": "strict-format"},"dataQuality": "medium","scrapeError": null}
Full schema: .actor/dataset_schema.json.
Quality you can trust
- No "nan" — null/NaN values become empty fields, never broken cells
- UTF-8 BOM — accented company names import cleanly into Excel and every CRM
- CSV injection guard (CWE-1236) — formula-triggering values (
=,+,-,@) are quoted to prevent execution when the CSV is opened in Excel - Single homepage fetch — company name, socials, and address extracted from the same response; no wasteful re-scraping
- WHOIS cache — duplicate domains in one run cost nothing
- Graceful errors — failed URLs still appear in the dataset with error context, so nothing is lost silently
Proxy support
Hit a Cloudflare block? Scraping EU sites that geo-fence US IPs? Add proxyConfiguration to your input and the actor will route every request through the Apify-managed proxy pool. Default is off — you only pay proxy bandwidth when you opt in.
| Tier | Apify cost | Best for | Failure mode |
|---|---|---|---|
DATACENTER | ~$2.50/GB | US sites without aggressive anti-bot | Blocked by Cloudflare / Akamai |
RESIDENTIAL | ~$12/GB | Anti-bot sites, EU geo-targeting, compliance-sensitive leads | 4–5× the bandwidth cost |
EU geo-pinning example:
{"urls": ["https://acme.de", "https://example.de"],"proxyConfiguration": {"useApifyProxy": true,"apifyProxyCountry": "DE"}}
Every run that uses proxy prints a summary line at the end so you can track cost. If any URL hits a Cloudflare challenge, you'll see a tip suggesting you enable proxy.
Scrape errors
Every record carries a top-level scrapeError: object | null field. code is one of eight machine-readable categories:
| Code | Meaning | Retry? |
|---|---|---|
timeout | Request exceeded the timeout budget | ✅ Retry |
blocked | HTTP 403 or Cloudflare / bot-challenge signal | ⚠️ With proxy |
dns_error | DNS lookup failed (ENOTFOUND / EAI_AGAIN) | ❌ Permanent |
tls_error | Certificate / TLS handshake failed | ❌ Permanent |
5xx | Upstream 5xx response | ✅ Retry |
4xx | Other 4xx response (404, 429, …) | ⚠️ Depends |
empty | Fetch succeeded but no contact data extracted | ⚠️ Optional |
unknown | Unclassified failure | ⚠️ Case-by-case |
Partial-success rule: if any path in the scrape loop yielded data, scrapeError is cleared to null. A record that got even one email from /contact succeeded.
Use cases
- Sales prospecting — find decision-maker emails and direct phones for outbound campaigns
- Cold outreach prep — build targeted lists with verified corporate emails and bounce-risk per domain
- Lead enrichment — append real contact data to existing CRM records
- Competitor research — map competitor digital presence at scale
- Domain due diligence — WHOIS-backed company name and registration date for vendor research
Technical notes
- Cheerio-based HTML extraction (lightweight, no headless browser overhead)
- Automatic retry with exponential backoff
- Rotating user agents to reduce blocks
- Configurable timeout (15s default, 5s WHOIS, 1s SMTP probe)
- Optional Apify proxy integration (DATACENTER / RESIDENTIAL / country pinning / custom URLs)
- 951 tests across 49 test suites
Categories: Lead generation · Data scraping · Sales automation
Tags: email scraper, phone extractor, social media finder, B2B lead enrichment, CRM enrichment, contact discovery, WHOIS lookup, sales automation, proxy support, Cloudflare bypass, residential proxy, datacenter proxy, email pattern finder, bounce risk