Pricing

from $1.90 / 1,000 results

Website Lead Enricher

Extract emails, phones, social profiles, and company data from any website. CRM-ready B2B lead enrichment with HubSpot, Salesforce, and Pipedrive export modes. Quality score, WHOIS lookup, and E.164 phone normalization included.

Pricing

from $1.90 / 1,000 results

Rating

0.0

(0)

Developer

RH Studios

Actor stats

Bookmarked

Total users

Monthly active users

17 days ago

Last modified

Why this Actor?

🎯 Stop guessing who's reachable — per-record isSendable flag + per-domain bounceRiskBucket so Instantly, Smartlead, and Apollo can filter out high-bounce-risk domains before you spend sending credits
📧 5–10× the email coverage per domain — Email Pattern Finder detects first.last / flast / first conventions from existing emails, runs a single SMTP catch-all probe, and generates 2–10 predicted team emails per domain (or 20–200 when paired with Hunter.io)
🔌 Drop-in HTTP API for agents and apps — Standby mode exposes /leads, /leads/{domain}, /stats, /health for AI agents, MCP integrations, and embedded B2B tools
📊 CRM-ready exports — HubSpot / Salesforce / Pipedrive column shapes built in; import without mapping
🤖 Heuristic, not AI — deterministic rules, no LLM cost, no external API keys, fully auditable
🛡️ No silent failures — per-step error isolation: one bad step never kills the record; every step carries ok / error status + structured {code, message} on failure
⚡ Up to 1,000 URLs per run, ~5s/record, parallel processing up to 10 concurrent

What you get per record

Every input URL produces one record with these fields:

Field	Type	What it tells you
📧 Emails	`string[]` classified	Corporate vs. generic vs. invalid; throwaway domains filtered
📱 Phones	`string[]` E.164	Normalized for 50+ countries
🌐 Socials	object	LinkedIn, Facebook, Instagram, X/Twitter, YouTube (validated, not generic pages)
🏢 Company	object	WHOIS registrant + registration date (opt-in)
📍 Address	object	City, postal code, country extracted from page text
⭐ Quality score	`0-100`	Per-record score with `breakdown` + `missing_fields` array
🏷️ Company type	enum	14 verticals (saas, saas_b2b, agency, ecommerce, legal, medical, consulting, manufacturing, media, nonprofit, education, realestate, finance, other) with confidence
📨 isSendable	boolean	Safe to mail? (see Outreach safety below)
🔍 emailPattern	string	Detected naming convention: `first.last`, `flast`, `first`, etc. (or `null`)
🎯 bounceRiskBucket	`low` / `medium` / `high`	Per-domain deliverability risk
📋 generatedEmails	array	Predicted team emails with provenance tags (`page-discovered`, `pattern-from-page`, `pattern-alternate`)
📞 contactForm	boolean + URL	Same-domain `<form>` on `/contact` etc. (3rd-party form vendors excluded)
⚠️ scrapeError	object \| null	Machine-readable failure code on hard errors
🛡️ pipelineData.steps[]	array	Per-step status + duration + error per record

Full schema: docs/NextSteps/EmailPatternFinder.md and .actor/dataset_schema.json.

Cost & performance

Batch size	Compute units (typical)	Wall-clock
100 URLs	~5 CU	~50s
1,000 URLs	~50 CU	~5–8 min

Free every run: heuristic extraction (no API cost). Pay only when you opt in: WHOIS lookups (~1s/URL), proxy bandwidth (DATACENTER ~$2.50/GB, RESIDENTIAL ~$12/GB).

Outreach safety

Two complementary signals tell you whether to mail a record:

1. Per-record: `isSendable`

isSendable: true only when all of the following hold:

A personal email (not no-reply@, noreply@, postmaster@)
The personal email's domain has valid MX (or A fallback) — 2s timeout
The domain is not a known spam-trap (mailinator, tempmail, guerrillamail)

Form-only records (no email, no phone) are flagged with isSendableReason: ["not_contactable"] so outreach tools can route them to a manual follow-up track instead of a campaign. Records with isSendable: true can be mapped straight to a campaign.

2. Per-domain: `patternAnalysis.bounceRiskBucket`

Bucket	Means
`low`	Domain has MX, server rejects unknown recipients, pattern confidence clears the goal threshold. Safe to send.
`medium`	SMTP probe inconclusive OR catch-all with valid MX OR `quick-outreach` with low confidence. Test before blasting.
`high`	Domain unreachable OR catch-all + no MX. Don't send.

Threshold tuned by the goal input:

`goal`	`bounceRiskBucket: "low"` requires	Outreach strategy
`quick-outreach`	`isCatchAll: false` AND `mxValid` AND `patternConfidence >= 0.9`	`single-shot` — only the primary pattern
`high-deliverability` (default)	`isCatchAll: false` AND `mxValid`	`fallback` — try alternate if primary bounces
`max-coverage`	any reachable domain	`progressive` — start strict, loosen based on response

The patternAnalysis.isCatchAll field is a tri-state (true / false / null) populated by a single-RCPT-TO SMTP probe on the domain's primary MX. Stampede-cached so concurrent calls for the same domain share one TCP socket. 1-second timeout; never blocks the step on unresponsive mail servers.

See docs/plans/IsSendable-implementation.md and docs/plans/EmailPatternFinder-implementation.md for the full algorithms.

How it works

Submit up to 1,000 URLs per run (bare domains auto-prefixed with https://)
Scrape each site with Cheerio-based HTML extraction (lightweight, no headless browser overhead), rotating user agents, and automatic retry with exponential backoff
Validate & enrich — emails classified, phones normalized, socials verified, WHOIS looked up, email pattern detected, SMTP catch-all probed
Export — one row per URL in the Apify Dataset, or download as a CSV ready for HubSpot, Salesforce, or Pipedrive

Note on JS-heavy sites: the production pipeline uses Cheerio + Axios only — no headless browser. Sites that render content client-side (React/Vue SPAs) will produce partial results. Pair with the optional proxyConfiguration to bypass anti-bot gates on protected sites. See the full pipeline below.

Pipeline at a glance

flowchart LR
    A[URLs<br/>up to 1,000] --> B[Step 1: Scrape<br/>Cheerio + Axios]
    B --> C[Step 2: Email<br/>Pattern Finder<br/>DNS + SMTP probe]
    C --> D[Classify &<br/>Validate<br/>phones, socials,<br/>company type]
    D --> E[Quality Score<br/>0-100]
    E --> F[Export]
    F --> F1[Apify Dataset<br/>one row per URL]
    F --> F2[CRM-ready CSV<br/>HubSpot / Salesforce / Pipedrive]
    F --> F3[Standby HTTP API<br/>/leads /leads/&#123;domain&#125; /stats]
    F --> F4[KV Store<br/>runSummary]

    classDef input fill:#1f2937,color:#fff,stroke:#0ea5e9,stroke-width:2px
    classDef step fill:#0ea5e9,color:#fff,stroke:#0369a1
    classDef output fill:#10b981,color:#fff,stroke:#047857
    class A input
    class B,C,D,E step
    class F,F1,F2,F3,F4 output

A standalone, color-rendered version of this diagram is live at website-lead-enricher.netlify.app. The source is docs/pipeline-diagram.html — feel free to fork it for your own pipeline pages. Drop the live URL into your Apify Actor long description for a richer preview than plain markdown.

🔌 Live HTTP API (Standby mode)

Run this Actor in Apify Standby mode and it spins up a read-only HTTP API on the standby port — perfect for AI agents, MCP integrations, embedded B2B tools, and Zapier/n8n-style workflows where you want a stable queryable endpoint instead of one-shot batch runs.

Endpoint	Returns
`GET /health`	Liveness probe (`{ status: "ok", uptimeMs }`)
`GET /leads`	Paginated list of enriched leads (max 1000 per page, supports `?limit=` and `?offset=`)
`GET /leads/{domain}`	Single-lead lookup by domain — full record shape (same as one Dataset row)
`GET /stats`	Run-level summary: `stepErrors` per pipeline step, `droppedRecords`, `totalRecords`, `durationMs`

CORS is open by default. The OpenAPI schema lives in .actor/openapi.json — import it into Postman, Insomnia, or any OpenAPI generator to scaffold a client in seconds.

# Get all sendable leads for a campaign import
# (Start the Actor in Standby mode from https://apify.com/operational_zirconia/website-lead-enricher first,
#  then replace <your-standby-host> with the standby URL the Apify Console gives you.)
curl https://<your-standby-host>/leads?limit=500 | \
  jq '.[] | select(.isSendable == true) | {email: .contacts.emails_corporate, domain, company: .company.name}'

The dataset is populated by previous normal (non-standby) runs; the standby server reads from the same Dataset and Key-Value Store. Pair a normal run with standby mode and the API stays queryable as long as the Actor is running.

CRM-ready export

Set csvMode in the input and get a file formatted exactly for your platform:

Mode	Booleans	Use case
`standard`	`true` / `false`	Generic CSV for custom tooling
`hubspot`	`true` / `false`	HubSpot Contact Import
`salesforce`	`TRUE` / `FALSE`	Salesforce Lead Import Wizard
`pipedrive`	`1` / `0`	Pipedrive Person Import

{
  "urls": ["https://acme.com", "https://stripe.com"],
  "csvMode": "hubspot"
}

Output: OUTPUT_HUBSPOT_CSV in the Key-Value Store tab — import directly, no transformation.

Filter by company type

Each input URL is heuristically classified into one of 14 verticals (saas, saas_b2b, agency, ecommerce, legal, medical, consulting, manufacturing, media, nonprofit, education, realestate, finance, or other) using schema.org markup, meta description, and body-text keywords. Set companyTypes to keep only the verticals you care about.

{
  "urls": ["https://acme.com", "https://bobslegal.com", "https://carsforkids.com"],
  "companyTypes": ["saas", "consulting"]
}

Dropped records remain in the Apify Dataset with passedCompanyTypeFilter: false so you can audit them; they are removed from the local CSV/JSON export.

Email Pattern Finder in depth

Step 2 detects the company's email naming convention from the emails Step 1 found on the page, validates the domain with MX + a single SMTP catch-all probe, and emits a generatedEmails[] array plus a patternAnalysis block.

What's emitted

{
  "emailPattern": "first.last",
  "patternConfidence": 0.92,
  "generatedEmails": [
    { "address": "jan.curry@acme.com", "name": "Jan Curry", "source": "page-discovered" },
    { "address": "ada.lovelace@acme.com", "name": "Ada Lovelace", "source": "pattern-from-page" },
    { "address": "curry.jan@acme.com", "name": "Jan Curry", "source": "pattern-alternate" }
  ],
  "patternAnalysis": {
    "mxValid": true,
    "isCatchAll": false,
    "emailCulture": "strict-format",
    "sequenceStrategy": "fallback",
    "bounceRiskBucket": "low"
  }
}

`source` enum values

Source	Meaning
`page-discovered`	Email Step 1 already found on the page that parses to a personal name
`pattern-from-page`	The detected pattern applied to a contact name found on the page
`pattern-alternate`	A backup pattern applied to the same names (when confidence is low)

See docs/NextSteps/EmailPatternFinder.md for the full spec, docs/plans/EmailPatternFinder-adr.md for the architecture decisions, and docs/plans/EmailPatternFinder-implementation.md for the build plan.

Input

{
  "urls": ["https://site1.com", "stripe.com", "www.example.org/contact"],
  "maxConcurrency": 5,
  "includeWhois": false,
  "csvMode": "standard",
  "companyTypes": ["saas", "consulting"],
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": ["DATACENTER"]
  },
  "skipEmailPatternFinder": false,
  "goal": "high-deliverability"
}

Field	Default	Description
`urls`	required	Up to 1,000 URLs or bare domains
`maxConcurrency`	`5`	Parallel requests (1–10). Use 1–2 for large batches
`includeWhois`	`false`	Adds registrant name and registration date (~1s extra per URL)
`csvMode`	`standard`	`standard`, `hubspot`, `salesforce`, or `pipedrive`
`companyTypes`	`[]`	Allow-list of verticals. Empty = include all.
`proxyConfiguration`	`{ useApifyProxy: false }`	Optional. Routes requests through Apify's proxy pool — see Proxy support
`skipEmailPatternFinder`	`false`	Skip Step 2 (Email Pattern Finder) — when true, no DNS / SMTP work is performed
`searchWhois`	`false`	Mine the WHOIS registrant email and add it to `generatedEmails[]` with `source: "whois-registrant"`. No-op when `skipEmailPatternFinder: true`
`goal`	`high-deliverability`	Outreach intent. `quick-outreach` (strict, single-shot), `high-deliverability` (medium, fallback), `max-coverage` (loose, progressive)
`hunterApiKey`	`null`	Optional Hunter.io API key. When set, pulls additional emails from Hunter's domain-search API into `generatedEmails[]` with `source: "hunter-api"`. Free tier works. Failures populate `patternAnalysis.hunterError` without failing the step

Sample output

{
  "url": "https://www.acme.com",
  "domain": "acme.com",
  "scrapedAt": "2026-06-21T10:00:00Z",
  "contacts": {
    "emails": [
      { "address": "jan@acme.com", "type": "corporate" },
      { "address": "contact@acme.com", "type": "generic" }
    ],
    "phones": ["+12125551234"]
  },
  "socials": { "linkedin": "https://linkedin.com/company/acme" },
  "qualityScore": { "total": 85, "breakdown": { "completeness": 80, "emailValidity": 100, "phoneValidity": 100, "socialPresence": 60 } },
  "companyType": "saas",
  "isSendable": true,
  "emailPattern": "first.last",
  "patternConfidence": 0.92,
  "generatedEmails": [
    { "address": "jan.curry@acme.com", "name": "Jan Curry", "source": "page-discovered" }
  ],
  "patternAnalysis": {
    "mxValid": true,
    "isCatchAll": false,
    "bounceRiskBucket": "low",
    "emailCulture": "strict-format"
  },
  "dataQuality": "medium",
  "scrapeError": null
}

Full schema: .actor/dataset_schema.json.

Quality you can trust

No "nan" — null/NaN values become empty fields, never broken cells
UTF-8 BOM — accented company names import cleanly into Excel and every CRM
CSV injection guard (CWE-1236) — formula-triggering values (=, +, -, @) are quoted to prevent execution when the CSV is opened in Excel
Single homepage fetch — company name, socials, and address extracted from the same response; no wasteful re-scraping
WHOIS cache — duplicate domains in one run cost nothing
Graceful errors — failed URLs still appear in the dataset with error context, so nothing is lost silently

Proxy support

Hit a Cloudflare block? Scraping EU sites that geo-fence US IPs? Add proxyConfiguration to your input and the actor will route every request through the Apify-managed proxy pool. Default is off — you only pay proxy bandwidth when you opt in.

Tier	Apify cost	Best for	Failure mode
`DATACENTER`	~$2.50/GB	US sites without aggressive anti-bot	Blocked by Cloudflare / Akamai
`RESIDENTIAL`	~$12/GB	Anti-bot sites, EU geo-targeting, compliance-sensitive leads	4–5× the bandwidth cost

EU geo-pinning example:

{
  "urls": ["https://acme.de", "https://example.de"],
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyCountry": "DE"
  }
}

Every run that uses proxy prints a summary line at the end so you can track cost. If any URL hits a Cloudflare challenge, you'll see a tip suggesting you enable proxy.

Scrape errors

Every record carries a top-level scrapeError: object | null field. code is one of eight machine-readable categories:

Code	Meaning	Retry?
`timeout`	Request exceeded the timeout budget	✅ Retry
`blocked`	HTTP 403 or Cloudflare / bot-challenge signal	⚠️ With proxy
`dns_error`	DNS lookup failed (`ENOTFOUND` / `EAI_AGAIN`)	❌ Permanent
`tls_error`	Certificate / TLS handshake failed	❌ Permanent
`5xx`	Upstream 5xx response	✅ Retry
`4xx`	Other 4xx response (404, 429, …)	⚠️ Depends
`empty`	Fetch succeeded but no contact data extracted	⚠️ Optional
`unknown`	Unclassified failure	⚠️ Case-by-case

Partial-success rule: if any path in the scrape loop yielded data, scrapeError is cleared to null. A record that got even one email from /contact succeeded.

Use cases

Sales prospecting — find decision-maker emails and direct phones for outbound campaigns
Cold outreach prep — build targeted lists with verified corporate emails and bounce-risk per domain
Lead enrichment — append real contact data to existing CRM records
Competitor research — map competitor digital presence at scale
Domain due diligence — WHOIS-backed company name and registration date for vendor research

Technical notes

Cheerio-based HTML extraction (lightweight, no headless browser overhead)
Automatic retry with exponential backoff
Rotating user agents to reduce blocks
Configurable timeout (15s default, 5s WHOIS, 1s SMTP probe)
Optional Apify proxy integration (DATACENTER / RESIDENTIAL / country pinning / custom URLs)
951 tests across 49 test suites

Categories: Lead generation · Data scraping · Sales automation

Tags: email scraper, phone extractor, social media finder, B2B lead enrichment, CRM enrichment, contact discovery, WHOIS lookup, sales automation, proxy support, Cloudflare bypass, residential proxy, datacenter proxy, email pattern finder, bounce risk

Google Maps Lead Pro

constant_quadruped/google-maps-lead-pro

Extract Google Maps leads with lead scoring (A-F grades) and CRM-ready formats for HubSpot, Salesforce, Pipedrive. Premium mode crawls websites to find emails and social media links.

Company Website Enricher — B2B Lead Intelligence

rsoft/company-website-enricher

Extract company info, emails, phone numbers, social media profiles, and technology stack from any website. Pure HTTP scraping, no browser needed. Perfect for B2B lead enrichment, competitive intelligence, and sales prospecting.

Roman Bednář

Website Contact Data Scraper

mandobatman/contact-scraper

Extract emails, phones, social links, addresses, and contact forms from any website URL. Perfect for B2B lead generation, sales prospecting, and CRM data enrichment.

Sid

B2B Lead Enrichment — Google Maps to CRM

samstorm/lead-enrichment-actor

Scrape any business type from Google Maps with verified email enrichment, phone numbers & social links. Works for any niche or location. Export to HubSpot, Salesforce, or CSV. The all-in-one B2B lead generation tool.

Sam Kleespies

B2B Lead Finder — Emails, Phones & Contacts

khadinakbar/universal-lead-finder

Find B2B leads from any niche and location via DuckDuckGo + website crawl. Returns company name, emails, phones, social links, and lead score. MCP-ready. $3/1K.

Khadin Akbar

150

Local Lead Finder Pro | $4/1K | Lead Score, Tech Stack, Pitch

apivault_labs/local-business-lead-finder-pro

Premium lead finder for web agencies and SDR teams. Scrape YellowPages and auto-enrich every lead with a 0-100 lead score, website tech stack (Wix/WordPress/Shopify), real emails, phone E.164, mobile + SEO audit, and an outreach pitch. CSV export for HubSpot/Pipedrive. 30+ fields per lead.

Apivault Labs

Clutch.co Lead Enricher — B2B Agency Scraper + Email Finder

avinashchby/clutch-co-lead-enricher

Scrape Clutch.co agency listings by category and location. Extract ratings, reviews, services, and website URLs. Enrich contacts by crawling websites for emails, phones, and social links. Export CRM-ready leads in JSON, CSV, HubSpot, Apollo, or Salesforce format.

Avinash

Email Finder & LinkedIn Scraper - B2B Lead Enrichment

inexhaustible_glass/linkedin-email-finder

Find business emails, phones, LinkedIn & enrich company data from any website. Get tech stack (40+ tools), WHOIS, SSL, MX records & lead quality score (A/B/C/D). Bulk processing. Perfect for B2B sales, cold outreach & CRM enrichment.

Hitman studio

223

5.0

Domain Lead Enricher

runtime/domain-lead-enricher

Enrich company domains and websites into CRM-ready lead records with company metadata, public contacts, social profiles, website technologies, source evidence, and lead scores.

scraping automation

Website Company Enricher

great_pistachio/website-company-enricher

Enrich company data from any website domain. Extracts company name, emails, phones, social links, tech stack, addresses, and more. A free alternative to Clearbit and Clay for lead enrichment and sales prospecting.

Saturnin Pugnet