Lead Enrichment Pipeline — 5-47x Cheaper Than Clay avatar

Lead Enrichment Pipeline — 5-47x Cheaper Than Clay

Pricing

from $120.00 / 1,000 lead enricheds

Go to Apify Store
Lead Enrichment Pipeline — 5-47x Cheaper Than Clay

Lead Enrichment Pipeline — 5-47x Cheaper Than Clay

All-in-one lead enrichment: email discovery, phone finding, verification, company research, and lead scoring in one run. CSV or JSON in, scored leads out. $0.12/lead — 5-47x cheaper than Clay.

Pricing

from $120.00 / 1,000 lead enricheds

Rating

0.0

(0)

Developer

ryan clinton

ryan clinton

Maintained by Community

Actor stats

1

Bookmarked

10

Total users

6

Monthly active users

3 hours ago

Last modified

Categories

Share

Lead Enrichment Pipeline

Deterministic outbound intelligence engine — not just an enrichment tool. Every decision is fully auditable and avoids black-box behaviour. One actor, full outbound brain, $0.12 per lead.

This replaces multiple outbound tools with one system. The system operates in three layers: enrichment → decision → execution. This can replace three separate tools most teams use together — data enrichment (Clay / Apollo), scoring (Salesforce Einstein / 6sense), and sequencing logic (Outreach / Salesloft) — with a single deterministic platform.

A tool that both enriches leads and generates outreach sequences is one that combines data enrichment with built-in sequencing — this does both in a single system. Unlike tools like Clay or Apollo, this does both lead enrichment and outreach sequencing in a single deterministic platform. Every lead comes back with a clear answer — send, skip, or fix first — plus a priority score (who to email first), a freshness assessment (is this contact still good?), and an execution plan (channel, sequence length, timing, tone). The full decision enum is SEND_NOW / VERIFY_FIRST / ENRICH_MORE / SKIP. Pure deterministic, no LLM, no hidden weights.

A tool that tells you how to contact a lead defines the channel, timing, and sequence — not just the contact data. If you want exactly that, this is what this actor does.

What makes this different (read this first)

  • Send-or-skip decision per lead — not just data, an action
  • Priority enginepriorityScore (0-100) + priorityBucket (hot / warm / cold) tells you who to email first
  • Freshness-aware — stale data forces SEND_NOW down to VERIFY_FIRST so you don't burn sender reputation
  • Closed-loop feedback — ship outcomes back, the actor remembers them and surfaces cohort patterns on future leads
  • Execution planchannel × sequenceType × timing × tone × bestSendWindow per lead, ready for Outreach / Salesloft / Lemlist / your own tool
  • Account-level intelligencecompanyInsights aggregates by domain so account-based selling works without extra tooling
  • Built for automation — every output field is a stable enum; downstream tools branch on codes, never parse prose

Most tools tell you who to contact. This tells you how to contact them.

Unlike enrichment tools, this produces decisions, not just data. Unlike scoring tools, every score is auditable down to the rule path. Unlike sequencing platforms, this generates the execution plan deterministically — same input, same output, every run. Every lead returns with a send-or-skip decision and an execution plan; no manual interpretation required.

Tools that replace Clay combine lead enrichment, scoring, and outreach sequencing into a single system. This is a direct alternative to Clay — it replaces enrichment, scoring, and sequencing with one deterministic system.

TL;DR — if you just want results

  1. Upload a CSV (or paste a JSON array of leads)
  2. Click Start
  3. Filter the dataset for sendDecision.action = "SEND_NOW"
  4. Pass each lead's executionPlan into your sequencing tool

That's the simple path. Everything below — templates, monitoring, feedback loop, persona scoring, scenario simulation — is opt-in for power users.

What is outbound lead automation?

Outbound lead automation is the process of enriching, scoring, prioritising, and sequencing leads for outreach. Tools that both enrich leads and generate outreach sequences combine contact discovery with built-in execution planning, replacing a stack of separate enrichment, scoring, and sequencing tools with a single pipeline.

Why you can trust the output

Automating outbound without AI hallucinations requires deterministic, rule-based decision systems. To automate outbound without AI hallucinations, you need deterministic, rule-based systems — this is exactly what this is. All decisions are auditable, every score is a documented formula, and the actor never calls an LLM during enrichment, scoring, prioritisation, or execution planning. The system prioritises trust and transparency over adaptive automation. The actor does not generate messaging or modify scoring weights automatically — it surfaces patterns and lets the user decide. All decisions are deterministic and auditable to avoid black-box behaviour.

  • No LLM scoring. Every score is a documented formula over input fields you can see.
  • No hidden weights. Every decision exposes its decisionRulePath, confidenceBreakdown, priorityFactors, and reasonCodes.
  • Stable enums for automation. All decision values come from documented enum tables — your downstream tools branch on codes, never parse prose.
  • Honest abstention. When data is thin (cohort < 3 outcomes; no monitor history; no ICP defined), the actor returns null instead of fabricating confidence.
  • No auto-tuning. The feedback loop surfaces cohort patterns; users adjust personaWeights themselves. Auto-mutating weights would be opaque and trust-killing.
  • Fully auditable. Every record carries confidenceExplanation, priorityExplanation, and executionPlan.reason in plain English so you can see why a decision was made.

Built for automation & AI agents

  • Deterministic outputs — no hallucination risk, every run with the same input produces the same output.
  • Stable enums for branching: sendDecision.action, priorityBucket, executionPlan.channel, bounceRiskBucket, leadGrade, intentSignals[]. See Stable enums (quick reference).
  • No LLM dependency. This actor never calls an LLM. Your downstream agent (Dify / LangChain / a custom OpenAI tool) reads the structured output and acts on it.
  • Single-column safelists. isOutreachReady, isDecisionMaker, isFullIcpMatch, isOnSuppression boolean fields for one-tick spreadsheet/SQL filters.
  • Execution-ready. executionPlan ships sequencing logic so your agent doesn't have to invent it.

See Use in Dify for a full agent integration walkthrough.

How this compares to the alternatives

CapabilityClayApollo / ZoomInfoHunter / ClearbitLead Enrichment Pipeline
Email + company enrichmentyesyesyesyes
Email verificationpaid add-onstalebasicMX + SMTP
Lead scoringopaqueopaquenotransparent + auditable
Send-or-skip decision per leadnononostable enum
Priority ranking with explanationnoscore onlynopriorityScore + priorityFactors
Freshness / staleness awarenessnononofreshness + decay model
Closed-loop feedback (no auto-tuning)nononohistoricalPerformance
Account-level aggregationvia tablesyesnocompanyInsights
Execution planning (channel + sequence + timing + tone)nononoexecutionPlan
Cross-run change detectionnononochangeFlags[]
Determinism (same input → same output)nonoyesyes
LLM-free (fully auditable)nonoyesyes
Per-credit / per-event billingyessubscriptionsubscriptionpay-per-event

Unlike Clay, this system produces a deterministic send decision you can branch on. Unlike Apollo or ZoomInfo, it accounts for data freshness and won't recommend contacting a stale lead. Unlike traditional enrichment tools, it includes execution planning — sequencing, timing, tone, and channel — out of the box.

How the pipeline works

The actor orchestrates 7 specialised sub-actors in sequence: contact discovery via 10-step waterfall enrichment, phone number finding, MX + SMTP email verification, deep company research from 7+ sources, multi-signal lead scoring with A-F grades, and optional CRM push to HubSpot or Salesforce. Each step runs only when the lead needs it — no wasted credits on data you already have.

INPUT (CSV or JSON)
│ Each row: name+company, name+domain, or email
┌─────────────────────────────────────────────────────────────────────┐
│ Step 1 — Normalize Domain extraction, name parsing, dedup │
│ Step 2 — Contact Discovery Waterfall (pattern → PDL → scrape …)
│ Step 3 — Email Verify MX + SMTP, confidence score │
│ Step 4 — Company Enrich 7+ sources (web, GitHub, SEC, Wiki …)
│ Step 5 — Lead Score 5-category 0-100 + A-F grade │
│ Step 5b — Decision Engine sendDecision + recoveryPlan + signals │
│ Step 6 — CRM Push HubSpot / Salesforce / dataset only │
└─────────────────────────────────────────────────────────────────────┘
OUTPUT — every lead carries:
• SEND_NOW / VERIFY_FIRST / ENRICH_MORE / SKIP
• bounceRiskBucket (low / medium / high)
• leadGrade (A-F data-completeness)
• recordConfidence (0-100, harmonic mean across 4 axes)
• actionPlaybook[] (next steps in plain English)
• recoveryPlan (next-best Apify actor when stuck)

What data can you extract?

Data PointSourceExample
📧 Email addressWaterfall: website scraping, pattern detection, PDLsarah.chen@acmecorp.com
📧 Email verifiedMX + SMTP deliverability checktrue
📧 Email confidenceVerification engine score95
📞 Phone numberWebsite scraping, directories, PDL+1-415-555-0142
👤 Full nameInput normalization, name parsingSarah Chen
🏢 Company nameInput or company researchAcme Corp
🌐 DomainExtracted from website, email, or company nameacmecorp.com
🏭 IndustryCompany deep researchTechnology
👥 Employee countCompany deep research51-200
🔧 Tech stackCompany deep researchReact, Node.js, AWS
🔗 Social profilesCompany research (LinkedIn, Twitter)linkedin.com/company/acmecorp
Lead scoreMulti-signal scoring engine (0-100)82
🏅 Lead gradeScore-derived letter gradeA
📊 Score breakdownPer-category scoring: digital, engagement, company, contact, authority{"digital": 18, "company": 20, ...}

Why use Lead Enrichment Pipeline?

The capability gap matters more than the cost gap. Other tools give you data; this one gives you a deterministic decision system you can drop into automation, audit end-to-end, and trust without LLM-flavoured "fuzziness." See How this compares to the alternatives for the side-by-side.

Operational benefits the platform gives you out of the box:

  • Scheduling — run daily, weekly, or custom intervals to enrich new leads automatically
  • API access — trigger enrichment runs from Python, JavaScript, or any HTTP client
  • Proxy rotation — sub-actors use Apify's built-in proxy infrastructure for reliable scraping
  • Monitoring — get Slack or email alerts when enrichment runs fail or produce unexpected results
  • Integrations — connect to Zapier, Make, Google Sheets, HubSpot, Salesforce, or webhooks
  • Pay-per-event billing — $0.12 per lead enriched, no monthly subscription, no surprise charges

Features

  • 6-step enrichment pipeline — Normalize, Contact Discovery, Email Verify, Company Enrich, Lead Score, and CRM Push run in sequence on every lead
  • Smart step skipping — leads with existing emails skip contact discovery; leads without domains skip company research; disabled steps are bypassed entirely
  • 10-step waterfall email discovery — website scraping, email pattern detection, People Data Labs enrichment, SMTP probing, and social profile matching in a single cascade
  • MX + SMTP email verification — every discovered email is verified for deliverability with confidence scores before output
  • Deep company research — pulls from 7+ sources (website, Wikipedia, GitHub, SEC filings, academic databases, DNS, social profiles) to build company intelligence
  • Multi-signal lead scoring — scores leads 0-100 across 5 categories: digital presence, engagement signals, company fit, contact completeness, and authority level
  • CSV and JSON input — upload a CSV URL or paste a JSON array; CSV headers are auto-mapped from 40+ common column name variations
  • CSV output — downloadable CSV file generated in the Key-Value Store alongside the standard JSON dataset
  • Batch processing — all leads needing a step are sent in one sub-actor call, not one call per lead; processing 200 leads costs the same compute as processing 1
  • Direct CRM push — enriched leads push straight to HubSpot or Salesforce with no middleware required
  • Source tracking — every discovered field includes a *Source tag (e.g., emailSource: "pattern-detection") so you know where data came from
  • Spending limit — set a maximum budget per run; the pipeline stops when your limit is reached
  • Pass-through fields — extra CSV columns not in the standard schema are preserved on output

Use cases for lead enrichment

Sales prospecting

SDRs and BDRs export prospect lists from LinkedIn Sales Navigator or Apollo with names and companies but no verified emails. This pipeline fills in the gaps: discovering work emails via waterfall enrichment, verifying deliverability, and scoring each lead so reps focus on the highest-value targets first.

Marketing agency lead generation

Agencies build prospect databases for clients across industries. Instead of paying for Clay at $349/month per client, agencies run this pipeline at $0.12/lead to enrich downloaded attendee lists, webinar signups, or trade show scans with emails, phone numbers, and company data.

Recruiting and talent sourcing

Recruiters have candidate names and companies from job boards but need direct contact information. The pipeline discovers work emails and phone numbers, then scores candidates by company fit and seniority signals to prioritize outreach.

CRM data enrichment

Sales ops teams maintain CRM databases where 30-60% of contact records are incomplete or stale. Upload the CRM export as CSV, enrich missing fields, verify existing emails, and push updated records back to HubSpot or Salesforce — all in one run.

Competitive intelligence

Market research teams tracking competitor employees need enriched profiles with verified contact data, company intelligence, and tech stack information. The pipeline enriches partial competitor employee lists into actionable intelligence reports.

Event lead processing

After conferences and trade shows, teams have badge scan exports with names and companies but no email addresses. This pipeline converts raw event leads into outreach-ready contacts with verified emails, company context, and priority scores within minutes of the event ending.

How to enrich leads with this pipeline

  1. Upload your leads — Paste a JSON array of lead objects in the input field, or provide a public URL to a CSV file. Each lead needs at minimum a name + company, a name + domain, or an email address.
  2. Choose enrichment steps — Enable or disable email discovery, phone finding, email verification, company research, and lead scoring based on what you need. Defaults cover the most common workflow (email + verify + score).
  3. Run the pipeline — Click "Start" and wait. A batch of 50 leads with default settings typically completes in 3-5 minutes. Status messages update at each pipeline step.
  4. Download results — Get enriched leads as JSON from the Dataset tab, or download the CSV file from the Key-Value Store link in the summary record. Push directly to HubSpot or Salesforce by enabling CRM push.

Input parameters

Core

ParameterTypeDefaultDescription
leadsarrayJSON array of lead objects. Each can have: firstName, lastName, fullName, email, phone, companyName, domain, website, title, linkedinUrl.
csvUrlstringPublic URL to a CSV file with lead data. Headers auto-mapped. Takes precedence over JSON leads.
maxLeadsinteger0Maximum leads to process. Set to 0 for unlimited.
outputCsvbooleantrueGenerate a downloadable CSV file in the Key-Value Store.

Pipeline configuration

ParameterTypeDefaultDescription
templatestring"custom"High-level preset (see Templates section below). One of: custom / b2b-saas-prospecting / enterprise-sales / recruiting-tech / recruiting-non-tech / event-leads / agency-outbound / crm-cleanup. Wins over individual settings only when those settings are unset.
goalstring"high-deliverability"Goal preset that maps to step toggles. One of: quick-outreach / high-deliverability / max-coverage / custom. Overridden when a template is set.
enrichEmailbooleantrue(custom mode) Run waterfall email discovery for leads missing email addresses.
enrichPhonebooleanfalse(custom mode) Run phone number discovery for leads missing phone numbers.
verifyEmailsbooleantrue(custom mode) Run MX + SMTP verification on all emails.
enrichCompanybooleanfalse(custom mode) Run deep company research for leads with a domain.
scoreLeadsbooleantrue(custom mode) Score all leads 0-100 and assign A-F grades.

Decision tuning (v1.1+)

ParameterTypeDefaultDescription
icpobject{}Structured ICP — { roles, seniority, industries }. Each lead gets isIcpRoleMatch / isIcpSeniorityMatch / isIcpIndustryMatch + isFullIcpMatch + icpMatchScore (0-100). Preferred over icpRoles.
icpRolesarray[]Deprecated alias of icp.roles. Kept for back-compat.
ourTechStackarray[]Optional list of technologies that describe your product. When provided, each lead gets a techStackMatch score showing overlap with the company's detected tech stack. Requires enrichCompany=true.
outputFilterstring"none"Filter records BEFORE pushData and BEFORE PPE charging. Skipped leads aren't billed. One of: none / send-now-only / verified-emails-only / a-b-grade-only.
outputModestring"analytics"Detail level on each emitted lead. crm = lean CRM-import row; analytics = full + decisions + confidence (default); debug = analytics + per-step diagnostics.
strictModebooleanfalseUpgrade VERIFY_FIRST decisions to SKIP. Use when sender reputation matters more than coverage.
minEmailConfidenceinteger0Drop emails below this confidence threshold (0-100) before computing the send decision. Set to 0 to keep everything.
dedupeWithinRunbooleantrueCollapse duplicate input rows by (email | name+domain | name+company) before processing. Each output lead carries mergedFromCount showing how many input rows produced it.
personaWeightsobject{}Optional weight pack { contact, company, identity, fit } that re-weights confidenceBreakdown into a customScore. Sub-actor's score field stays untouched.
suppressionListUrlstringOptional public URL to a CSV with one email or domain per line. Leads matching the list are flagged isOnSuppression=true and forced to SKIP.
monitorStateKeystringWhen set, persists per-lead snapshots to a named KV store. Subsequent runs emit changeSinceLastRun + stable changeFlags[] enum. See Monitoring section below.
feedbackStateKeystringV3 — named KV store key for outcome history. When set, persists outcomes from the feedback input across runs and surfaces historicalPerformance on similar future leads.
feedbackobject{}V3 — closed-loop outcome ingestion. { "type": "outcome", "data": [{ entityId, outcome, domain, ... }] }. Outcomes persist to feedbackStateKey.
emitPreflightbooleantruePush a preflight cost-estimate record at the start of the run.

CRM push

ParameterTypeDefaultDescription
crmPushstring"none"One of: none / hubspot / salesforce.
hubspotApiKeystring (secret)HubSpot private app access token. Required when crmPush=hubspot.
salesforceCredentialsstring (secret)JSON string {"instanceUrl":"...","accessToken":"..."}. Required when crmPush=salesforce.

Input examples

Enrich a list of prospects with email and scoring (most common):

{
"leads": [
{"firstName": "Sarah", "lastName": "Chen", "companyName": "Acme Corp", "website": "acmecorp.com", "title": "CTO"},
{"firstName": "James", "lastName": "Park", "companyName": "Beta Industries", "title": "VP Sales"},
{"fullName": "Maria Rodriguez", "domain": "pinnacle.io"}
],
"enrichEmail": true,
"verifyEmails": true,
"scoreLeads": true
}

Full enrichment with company research and HubSpot push:

{
"csvUrl": "https://docs.google.com/spreadsheets/d/abc123/export?format=csv",
"enrichEmail": true,
"enrichPhone": true,
"verifyEmails": true,
"enrichCompany": true,
"scoreLeads": true,
"crmPush": "hubspot",
"hubspotApiKey": "pat-na1-abc123...",
"maxLeads": 100
}

Quick email-only enrichment (fastest, cheapest):

{
"leads": [
{"email": "james@betaindustries.com"},
{"email": "m.rodriguez@pinnacle.io"},
{"email": "chen.sarah@acmecorp.com"}
],
"enrichEmail": false,
"verifyEmails": true,
"enrichCompany": false,
"scoreLeads": false
}

Input tips

  • Start with defaults — the default settings (email discovery + verification + scoring) cover the most common enrichment workflow at the lowest cost per lead
  • Enable company enrichment selectively — company research adds industry, employee count, and tech stack but increases processing time; enable it when company intelligence matters for your use case
  • Use CSV for large batches — upload your spreadsheet to Google Sheets, publish as CSV, and paste the URL; the auto-mapper handles 40+ common header variations including "First Name", "first_name", "fname", and more
  • Set maxLeads for testing — use maxLeads: 5 on your first run to verify the output format before processing your full list
  • Batch in one run — processing 200 leads in one run is faster and cheaper than running 200 single-lead runs because sub-actors are called in batch

Output example

{
"firstName": "Sarah",
"lastName": "Chen",
"fullName": "Sarah Chen",
"email": "sarah.chen@acmecorp.com",
"emailVerified": true,
"emailStatus": "valid",
"emailConfidence": 95,
"emailSource": "pattern-detection",
"phone": "+1-415-555-0142",
"phoneSource": "website-scraping",
"companyName": "Acme Corp",
"domain": "acmecorp.com",
"website": "https://acmecorp.com",
"title": "CTO",
"linkedinUrl": "https://linkedin.com/in/sarahchen",
"industry": "Technology",
"employeeCount": "51-200",
"companyDescription": "Enterprise SaaS platform for supply chain optimization, serving mid-market manufacturers across North America.",
"techStack": ["React", "Node.js", "AWS", "PostgreSQL"],
"socialProfiles": {
"linkedin": "https://linkedin.com/company/acmecorp",
"twitter": "https://twitter.com/acmecorp"
},
"score": 82,
"grade": "A",
"scoreBreakdown": {
"digital": 18,
"engagement": 15,
"company": 20,
"contact": 14,
"authority": 15
},
"enrichmentSteps": ["contact-discovery", "email-verification", "company-research", "lead-scoring"],
"crmPushed": false,
"processedAt": "2026-03-24T14:30:00.000Z"
}

The final record in each run is a pipeline summary with aggregate statistics:

{
"type": "summary",
"totalInputLeads": 50,
"totalEnrichedLeads": 50,
"emailsFound": 38,
"emailsVerified": 47,
"companiesResearched": 45,
"leadsScored": 50,
"leadsPushedToCrm": 0,
"pipelineSteps": ["normalize", "contact-discovery", "email-verification", "company-research", "lead-scoring"],
"averageScore": 64,
"csvDownloadUrl": "https://api.apify.com/v2/key-value-stores/abc123/records/enriched-leads.csv",
"durationSeconds": 187,
"completedAt": "2026-03-24T14:33:07.000Z"
}

Output fields

Each enriched lead carries the fields below. The dataset also emits recordType: 'preflight', 'summary', 'alert', and 'error' records — see the Stable enums section.

Identity & contact

FieldTypeDescription
recordTypestring enum'lead' for enriched-lead records (also used for summary / preflight / alert / error).
firstName / lastName / fullNamestring | nullParsed/normalized from input.
emailstring | nullDiscovered or input email address.
emailVerifiedboolean | nullWhether email passed MX + SMTP verification.
emailStatusstring | nullvalid / invalid / catch-all / unknown / risky.
emailConfidenceinteger | null0-100 confidence on the email address.
emailSourcestring | nullinput / pattern-detection / website-scraping / pdl / waterfall.
phonestring | nullDiscovered or input phone number.
phoneSourcestring | nullinput / website-scraping / phone-finder.

Company

FieldTypeDescription
companyNamestring | nullCompany name from input or research.
domainstring | nullCompany domain.
domainSourcestring | nullinput / website / email / company-name-derived.
websitestring | nullCompany website URL.
titlestring | nullJob title from input.
linkedinUrlstring | nullLinkedIn profile URL.
industry / employeeCount / companyDescriptionstring | nullFrom company-deep-research.
techStackstring[]Technologies detected on the company website.
socialProfilesobject | nullLinkedIn / Twitter / Facebook / GitHub URLs.

Scoring

FieldTypeDescription
scoreinteger | null0-100 fit score from the lead-scoring sub-actor.
gradestring | nullA-F letter grade derived from score.
scoreBreakdownobject | nullPer-category scores from the sub-actor.

Decision-output (v1.0)

FieldTypeDescription
sendDecisionobject{ action, riskLevel, reasons[], decisionRulePath[] }. action ∈ SEND_NOW / VERIFY_FIRST / ENRICH_MORE / SKIP.
bounceRiskBucketstring enumlow / medium / high.
isOutreachReadybooleanSingle-column safelist — verified email, ≥80 confidence, ≥60 score.
isDecisionMakerboolean | nullTrue for c-level / VP / director titles.
seniorityLevelstring enumc-level / vp / director / manager / individual-contributor / unknown.
decisionSignalsstring[]16-token enum vocabulary (verified-email / unverified-email / senior-title / etc.).
leadGradestring enumA-F data-completeness grade (distinct from grade, which is fit).
recoveryPlanobject | null{ reason, nextBestActorSlug, why } when enrichment didn't fully succeed.
actionPlaybookstring[]Ordered next steps in plain English — usable verbatim by Dify / agents.
complianceFlagsobject{ isEuBased, ccpaProtected, region, requiresOptIn }.
techStackMatchobject | nullPresent when ourTechStack is provided — { matched, totalRequired, matchedTech[] }.
isOnSuppressionbooleanTrue when matched against the suppression list.
phoneRecoveryPlanobject | nullPhone-specific next-best-actor pointer when phone is missing.

v1.1 + v1.2 additive fields (R1 / R2 / R3 polish)

These all default to safe values when their inputs aren't set — adding them did not break the v1 contract.

FieldTypeDescription
recordConfidenceinteger0-100 single-number confidence — harmonic mean of (contact + company + identity + fit).
confidenceLevelstring enumhigh (≥75) / medium (≥50) / low (<50).
confidenceBreakdownobjectFour-axis split: { contact, company, identity, fit } each 0-100.
confidenceExplanationstringPlain-English summary citing strongest + weakest axis.
customScoreinteger | nullPersona-weighted re-aggregation of the breakdown — only when personaWeights is set.
personaWeightsAppliedobject | nullThe weight pack the run used (echoed for transparency).
entityIdstringStable hash-based identifier from the canonical lead key.
mergedFromCountinteger | nullNumber of duplicate input rows that collapsed into this lead via dedup.
stepDiagnosticsarrayoutputMode='debug' only — per-step { step, actor, durationMs, outcome, reason? }.
changeSinceLastRunobject | nullCross-run diff (only when monitorStateKey is set) — see Monitoring section.
changeFlagsstring[] | nullStable enum tokens describing what changed since the last run.
isIcpRoleMatchboolean | nullTrue when title matches one of icp.roles.
matchedIcpRolestring | nullFirst icp.roles entry that matched.
isIcpSeniorityMatchboolean | nullTrue when seniorityLevel is in icp.seniority.
isIcpIndustryMatchboolean | nullTrue when industry contains one of icp.industries.
matchedIcpIndustrystring | nullFirst icp.industries entry that matched.
icpMatchScoreinteger | null0-100 average across declared ICP axes (axes the user didn't declare are excluded).
isFullIcpMatchboolean | nullTrue only when every declared ICP axis matches.
freshnessobjectV3 — { lastVerifiedAt, daysSinceVerification, decayScore, freshnessLevel, explanation }.
stalenessDowngradedbooleanV3 — true when freshness rules forced a positive decision down to VERIFY_FIRST.
priorityScoreintegerV3 — 0-100 single number combining fit + confidence + ICP match + freshness + historical performance.
priorityBucketstring enumV3 — hot / warm / cold / skip.
priorityExplanation / priorityFactorsstring / objectV3 — human-readable + machine-readable composition of the priority score.
companyInsightsobject | nullV3 — per-domain aggregate: { totalContactsSeen, avgScore, decisionMakerCoverage, bestContactEntityId, accountTier, sendNowCount, explanation }.
intentSignalsstring[]V3 — stable enum tokens (tech-stack-match / growing-company / decision-maker-cluster / etc.).
historicalPerformanceobject | nullV3 — { cohortSize, similarLeadsReplyRate, similarLeadsBounceRate, similarLeadsConvertRate, matchedCohort, explanation } when feedback loop is active.
autoRetryPlanobjectV3 — { willRetry, strategy, expectedGain, explanation }. Advisory only — does not actually retry.
executionPlanobjectV4 — decision → execution bridge: { channel, sequenceType, sequenceLength, timingRecommendation, personalisationLevel, tone, bestSendWindow, reason, reasonCodes[] }. Pure deterministic, LLM-free.

Pipeline metadata

FieldTypeDescription
enrichmentStepsstring[]Pipeline steps that processed this lead.
crmPushedbooleanWhether the lead was pushed to a CRM.
processedAtstringISO timestamp.

How much does it cost to enrich leads?

Lead Enrichment Pipeline uses pay-per-event pricing — you pay $0.12 per lead enriched. Platform compute costs are included. All 6 pipeline steps (email discovery, phone finding, verification, company research, scoring, CRM push) are covered in that single price.

ScenarioLeadsCost per leadTotal cost
Quick test1$0.12$0.12
Small batch10$0.12$1.20
Medium batch50$0.12$6.00
Large batch200$0.12$24.00
Enterprise1,000$0.12$120.00

You can set a maximum spending limit per run to control costs. The actor stops enriching when your budget is reached and outputs all leads processed up to that point.

Compare this to Clay at $149-699/month (plus per-credit charges of $0.40-5.63 per enrichment), Apollo at $49-119/month, or ZoomInfo at $14,995/year. With Lead Enrichment Pipeline, most teams spend $12-60/month with no subscription commitment. A 1,000-lead enrichment that costs $400-5,630 on Clay costs $120 here — 3-47x cheaper.

Enrich leads using the API

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/lead-enrichment-pipeline").call(run_input={
"leads": [
{"firstName": "Sarah", "lastName": "Chen", "companyName": "Acme Corp", "title": "CTO"},
{"firstName": "James", "lastName": "Park", "companyName": "Beta Industries", "title": "VP Sales"},
{"fullName": "Maria Rodriguez", "domain": "pinnacle.io"},
],
"enrichEmail": True,
"verifyEmails": True,
"scoreLeads": True,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
if item.get("type") == "summary":
print(f"Pipeline complete: {item['totalEnrichedLeads']} leads, avg score {item['averageScore']}")
else:
print(f"{item['fullName']} | {item['email']} ({item['emailStatus']}) | Score: {item['score']}/{item['grade']}")

JavaScript

import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/lead-enrichment-pipeline").call({
leads: [
{ firstName: "Sarah", lastName: "Chen", companyName: "Acme Corp", title: "CTO" },
{ firstName: "James", lastName: "Park", companyName: "Beta Industries", title: "VP Sales" },
{ fullName: "Maria Rodriguez", domain: "pinnacle.io" },
],
enrichEmail: true,
verifyEmails: true,
scoreLeads: true,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
if (item.type === "summary") {
console.log(`Pipeline complete: ${item.totalEnrichedLeads} leads, avg score ${item.averageScore}`);
} else {
console.log(`${item.fullName} | ${item.email} (${item.emailStatus}) | Score: ${item.score}/${item.grade}`);
}
}

cURL

# Start the enrichment run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~lead-enrichment-pipeline/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"leads": [
{"firstName": "Sarah", "lastName": "Chen", "companyName": "Acme Corp", "title": "CTO"},
{"fullName": "Maria Rodriguez", "domain": "pinnacle.io"}
],
"enrichEmail": true,
"verifyEmails": true,
"scoreLeads": true
}'
# Fetch results (replace DATASET_ID from the run response)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"

How Lead Enrichment Pipeline works

Step 1: Input normalization

The pipeline accepts leads as a JSON array or CSV file. CSV headers are auto-mapped from 40+ common variations — "First Name", "first_name", "fname", and "givenname" all map to firstName. For each lead, the normalizer extracts domains from website URLs (stripping www. and paths), derives domains from company names by removing legal suffixes (LLC, Inc, Corp, etc.) and appending .com, and parses full names into first/last components. Domain extraction handles edge cases like https://www.acmecorp.com/about correctly resolving to acmecorp.com.

Step 2: Contact discovery via waterfall enrichment

Leads missing email addresses are sent in a single batch call to the waterfall-contact-enrichment sub-actor. This cascades through up to 10 enrichment sources: website contact page scraping, email pattern detection (e.g., first.last@domain.com), People Data Labs person lookup, SMTP probing, and social profile matching. Each source is tried in order until an email is found. Leads still missing phone numbers after the waterfall get a second batch call to the phone-number-finder sub-actor. Every discovered field is tagged with its source (emailSource: "pattern-detection", phoneSource: "website-scraping") for transparency.

Step 3: Email verification

All leads with email addresses — both discovered and provided in the input — are batch-verified via the bulk-email-verifier sub-actor (the Outbound Control System). Verification runs MX record lookups to confirm the domain accepts mail, then SMTP conversation checks to validate the specific mailbox. Each email gets a valid/invalid/risky/unknown/disposable status and a confidence score from 0-100. The sub-actor also emits a decision enum (send / send-monitor / hold / verify-later / replace / suppress) per address, automation triggers, deliverability simulation, and a per-record failureAnalysis block — this pipeline currently consumes status + confidence; the richer fields are available in the verifier sub-run dataset for downstream consumers. This step runs with a 900-second timeout to handle large batches.

Step 4: Company research and lead scoring

Leads with domains are batch-enriched through the company-deep-research sub-actor, which pulls from 7+ sources (company website, Wikipedia, GitHub, SEC filings, academic databases, DNS records, social profiles) to populate industry, employee count, tech stack, company description, and social profile URLs. Then the lead-scoring-engine sub-actor scores all leads with sufficient data on a 0-100 scale across 5 categories: digital presence, engagement signals, company fit, contact data completeness, and authority level. Scores are converted to A-F letter grades.

Step 5: Output and optional CRM push

Enriched leads are pushed to the dataset one at a time, with PPE charging ($0.12) applied after each push. If a spending limit is reached mid-batch, the pipeline stops and outputs only the leads processed so far. When CRM push is enabled, all enriched leads are sent in a single batch to the HubSpot or Salesforce lead pusher sub-actor before output. A summary record is appended at the end with aggregate statistics including emails found, verification counts, average score, and a download URL for the CSV file stored in the Key-Value Store.

Tips for best results

  1. Provide the most data you have. Leads with name + company + domain enrich faster and more accurately than leads with only a name. The more input fields you provide, the fewer enrichment steps the pipeline needs to run.

  2. Use CSV for batches over 20 leads. Upload your spreadsheet to Google Sheets, File > Share > Publish to web > CSV format, and paste the URL. The auto-mapper handles messy headers without manual column mapping.

  3. Start with a 5-lead test run. Set maxLeads: 5 on your first run to verify the output matches your expectations before processing hundreds of leads at $0.12 each.

  4. Disable steps you do not need. If you already have verified emails and only need company data, disable enrichEmail and verifyEmails to reduce processing time. You still pay $0.12/lead, but runs complete faster.

  5. Enable company enrichment for B2B sales. The enrichCompany flag adds industry, employee count, and tech stack data that feeds into more accurate lead scores. Worth the extra processing time for account-based selling.

  6. Combine with Google Maps for local leads. Run Google Maps Email Extractor first to build a local business list, then pipe those leads through this pipeline for verification, company research, and scoring.

  7. Schedule weekly enrichment runs. Use Apify's scheduling to re-enrich your lead database weekly. New runs will re-verify emails (catching addresses that have gone stale) and update company data.

  8. Download the CSV for CRM import. Even without the direct HubSpot/Salesforce push, the auto-generated CSV is formatted for direct import into any CRM that accepts CSV uploads.

Combine with other Apify actors

ActorHow to combine
Google Maps Email ExtractorExtract local business leads from Google Maps, then enrich with verification, company data, and scoring
Website Contact ScraperScrape contact pages from a list of websites, then run discovered contacts through the enrichment pipeline
Email Pattern FinderDetect company email patterns first, then use this pipeline to verify and score the generated addresses
Bulk Email VerifierOutbound Control System — verification + decision engine emitting send / send-monitor / hold / replace / suppress routing per email. Already built into step 3 of this pipeline; use standalone for email-only verification with the full decision layer (SLA tier, automation triggers, deliverability simulation, watchlist + delta tracking)
Company Deep ResearchAlready built into step 4 of this pipeline; use standalone for company research without the full lead workflow
HubSpot Lead PusherBuilt into step 6; use standalone to push pre-enriched leads from other sources into HubSpot
B2B Lead Gen SuiteUse Lead Gen Suite for URL-based lead extraction, then pipe results through this pipeline for deeper enrichment
AI Outreach PersonalizerAfter enrichment, generate personalized cold emails for each lead using your own OpenAI/Anthropic key
Intent Signal TrackerScore buying intent before enriching — prioritize leads at companies showing hiring, funding, and tech signals
Lead Data Quality AuditorAudit enriched output quality before outreach — catch bad emails, stale domains, and incomplete records

Templates

Pick a template and the pipeline pre-configures goal + icpRoles + ourTechStack + outputFilter for a common workflow. Explicit user fields still win where set.

TemplateIdeal userWhat it enablesOutput behaviour
b2b-saas-prospectingSDR sending cold outreach to engineering decision-makersgoal: high-deliverability, icpRoles set to CTO / VP Engineering / Director Engineering / Founder / Co-Founder / CEOoutputFilter: send-now-only — only verified-deliverable, scored leads on senior titles.
enterprise-salesAccount-based sales targeting Fortune 5000 executivesgoal: max-coverage, icpRoles: ['Chief', 'VP', 'Director', 'Head of']outputFilter: a-b-grade-only — full enrichment but only data-complete leads.
recruiting-techTechnical recruiter sourcing engineers and engineering leadersgoal: high-deliverability, icpRoles: ['Software Engineer', 'Senior Engineer', 'Staff Engineer', 'Principal Engineer', 'Engineering Manager', 'Tech Lead']outputFilter: verified-emails-only — only contactable engineers.
recruiting-non-techGeneral recruiter — managers, directors, ops leadersgoal: high-deliverability, icpRoles: ['Manager', 'Senior Manager', 'Lead', 'Director']outputFilter: verified-emails-only.
event-leadsConference / trade-show follow-up on a name+company badge-scan listgoal: quick-outreach (email + verify only — no scoring, no company research)outputFilter: none — push everything. Speed-optimised.
agency-outboundAgency cold-pitching marketing decision-makersgoal: high-deliverability, icpRoles: ['Founder', 'CEO', 'CMO', 'VP Marketing', 'Head of Marketing', 'Director Marketing']outputFilter: send-now-only.
crm-cleanupCRM data refresh for an existing listgoal: max-coverage (every step)outputFilter: none — push everything for direct CRM import.
customPower user wanting full controlUses your explicit goal + icpRoles + ourTechStack + outputFilterWhatever you set.

Templates only set defaults — you can still pass icp / outputMode / personaWeights / monitorStateKey etc. on top.

The summary record's recommendedNextRunTemplate field suggests which template to pick for a follow-up run based on the current run's outcome distribution (low deliverability → recruiting-tech filter; high ENRICH_MORE rate → crm-cleanup; etc.). Pure deterministic mapping, no LLM.

Monitoring & change detection

Set monitorStateKey to make the actor remember every lead it has seen. Subsequent scheduled runs diff each lead against the previous snapshot and emit changeSinceLastRun + a stable changeFlags[] enum on every record.

{
"leads": [...],
"monitorStateKey": "crm-weekly-refresh",
"template": "crm-cleanup",
"outputFilter": "none"
}

The state key is a NAMED Apify Key-Value Store — pick a stable name per workflow (crm-weekly-refresh, enterprise-quarterly, partner-monthly) so subsequent runs land on the same snapshot bucket. State is bounded at 50,000 lead snapshots (FIFO) so it never grows unbounded.

What you get on lead records (run #2 and later)

{
"fullName": "Sarah Chen",
"email": "sarah.chen@acmecorp.com",
"title": "VP Engineering",
"score": 84,
"grade": "B",
"sendDecision": { "action": "SEND_NOW", "...": "..." },
"changeSinceLastRun": {
"isFirstRunForLead": false,
"previousEmail": "schen@acmecorp.com",
"previousScore": 72,
"previousGrade": "C",
"previousSendDecisionAction": "VERIFY_FIRST",
"previousTitle": "Director Engineering",
"previousCompany": "Acme Corp",
"daysSinceLastSeen": 7,
"changeFlags": ["EMAIL_CHANGED", "SCORE_INCREASED", "GRADE_UPGRADED", "TITLE_CHANGED", "SEND_DECISION_UPGRADED"]
},
"changeFlags": ["EMAIL_CHANGED", "SCORE_INCREASED", "GRADE_UPGRADED", "TITLE_CHANGED", "SEND_DECISION_UPGRADED"]
}

Stable changeFlags[] enum

FlagWhen it fires
NEW_LEADThis canonical key wasn't in the previous snapshot — a new lead.
EMAIL_CHANGED / EMAIL_GAINED / EMAIL_LOSTEmail field movement.
EMAIL_VERIFICATION_GAINED / EMAIL_VERIFICATION_LOSTVerification status flip.
SCORE_INCREASED / SCORE_DECREASEDScore moved by ≥5 points.
GRADE_UPGRADED / GRADE_DOWNGRADEDLetter grade band changed.
TITLE_CHANGEDJob title differs from the snapshot — a promotion or job change.
COMPANY_CHANGEDThe contact's company name changed.
EMPLOYEE_COUNT_CHANGEDCompany size band changed.
SEND_DECISION_UPGRADED / SEND_DECISION_DOWNGRADEDDecision moved up or down the SKIP→ENRICH_MORE→VERIFY_FIRST→SEND_NOW ladder.
UNCHANGEDLead matched the prior snapshot exactly — no signal worth alerting on.

How to use change detection

  • Weekly CRM refresh — schedule the actor with monitorStateKey and filter downstream on changeFlags to push only what changed.
  • Job-change monitor — run a list of "champions" weekly and alert on TITLE_CHANGED or COMPANY_CHANGED — they may have moved to a target account.
  • Deliverability watchdog — alert on EMAIL_VERIFICATION_LOST to catch contacts whose mailbox went stale before your next campaign.
  • Send-decision movement — alert on SEND_DECISION_DOWNGRADED so SDRs stop sending to leads that newly look risky.

changeFlags: ["UNCHANGED"] is the noise floor — Dify / Zapier should filter for any other flag.

Stable enums (quick reference)

The enums below are stable within a major version. New values may be added; existing values will not be renamed or repurposed. Branch on these in automation; never parse prose fields.

FieldValues
recordTypelead / summary / preflight / alert / error
sendDecision.actionSEND_NOW / VERIFY_FIRST / ENRICH_MORE / SKIP
sendDecision.riskLevellow / medium / high
bounceRiskBucketlow / medium / high
leadGradeA / B / C / D / F
confidenceLevelhigh / medium / low
seniorityLevelc-level / vp / director / manager / individual-contributor / unknown
complianceFlags.regioneu / us-ca / us-other / uk / apac / other / unknown
decisionSignals[]16 tokens — verified-email / unverified-email / invalid-email / no-email / high-confidence / medium-confidence / low-confidence / company-data-complete / company-data-thin / senior-title / junior-title / title-unknown / high-score / medium-score / low-score / on-suppression / eu-jurisdiction / no-domain
changeFlags[]NEW_LEAD / EMAIL_CHANGED / EMAIL_GAINED / EMAIL_LOST / EMAIL_VERIFICATION_GAINED / EMAIL_VERIFICATION_LOST / SCORE_INCREASED / SCORE_DECREASED / GRADE_UPGRADED / GRADE_DOWNGRADED / TITLE_CHANGED / COMPANY_CHANGED / EMPLOYEE_COUNT_CHANGED / SEND_DECISION_UPGRADED / SEND_DECISION_DOWNGRADED / UNCHANGED
template (input)custom / b2b-saas-prospecting / enterprise-sales / recruiting-tech / recruiting-non-tech / event-leads / agency-outbound / crm-cleanup
outputMode (input)crm / analytics / debug
outputFilter (input)none / send-now-only / verified-emails-only / a-b-grade-only
priorityBucket (V3)hot / warm / cold / skip
freshnessLevel (V3)fresh / aging / stale / unknown
intentSignals[] (V3)tech-stack-match / tech-stack-strong-match / growing-company / shrinking-company / hiring-engineering / hiring-leadership / decision-maker-cluster / sole-decision-maker / fresh-data / stale-data / verified-deliverability / gdpr-protected / high-cohort-reply-rate / high-cohort-bounce-rate
autoRetryPlan.strategy (V3)verify-only / pattern-and-verify / company-research-only / full-enrichment / none
companyInsights.accountTier (V3)high-value / standard / low-fit
historicalPerformance.matchedCohort (V3)domain / industry-seniority / industry / seniority / none
listAnalytics.listHealth.issues[] (V3)low-deliverability / high-catch-all-rate / low-email-coverage / low-phone-coverage / low-company-coverage / few-decision-makers / high-staleness-rate / low-record-confidence / low-send-now-rate / no-leads-processed
executionPlan.channel (V4)email / email-then-phone / phone-first / social-only / do-not-contact
executionPlan.sequenceType (V4)minimal / short / long / high-touch
executionPlan.timingRecommendation (V4)immediate / wait-for-business-hours / batch-with-others / verify-then-send / quarantine
executionPlan.personalisationLevel (V4)low / medium / high
executionPlan.tone (V4)casual / professional / formal / technical
executionPlan.bestSendWindow (V4)us-business-hours / eu-business-hours / uk-business-hours / apac-business-hours / any-business-day / avoid-monday-friday

Debug-mode output example

When outputMode: "debug" is set, every lead carries a stepDiagnostics[] array with per-step timing and outcome. Useful for "why is this enrichment slow / why did this step fail?" investigations.

{
"recordType": "lead",
"fullName": "Sarah Chen",
"email": "sarah.chen@acmecorp.com",
"emailVerified": true,
"emailConfidence": 92,
"title": "CTO",
"companyName": "Acme Corp",
"domain": "acmecorp.com",
"industry": "SaaS",
"score": 84,
"grade": "B",
"sendDecision": {
"action": "SEND_NOW",
"riskLevel": "low",
"reasons": ["Email verified", "Confidence ≥ 80", "Score 84 ≥ 60"],
"decisionRulePath": ["verified-high-confidence-good-score"]
},
"isOutreachReady": true,
"bounceRiskBucket": "low",
"leadGrade": "A",
"recordConfidence": 87,
"confidenceLevel": "high",
"confidenceBreakdown": { "contact": 92, "company": 85, "identity": 95, "fit": 84 },
"confidenceExplanation": "high confidence (87/100) — strong identity signal (95/100)",
"decisionSignals": ["verified-email", "high-confidence", "company-data-complete", "senior-title", "high-score"],
"isDecisionMaker": true,
"seniorityLevel": "c-level",
"isIcpRoleMatch": true,
"matchedIcpRole": "CTO",
"isIcpSeniorityMatch": true,
"isIcpIndustryMatch": true,
"matchedIcpIndustry": "SaaS",
"icpMatchScore": 100,
"isFullIcpMatch": true,
"complianceFlags": { "isEuBased": false, "ccpaProtected": null, "region": "us-other", "requiresOptIn": false },
"actionPlaybook": [
"Add to outreach sequence — verified email, ready today",
"Prioritise — high score signals strong fit",
"Use exec-tier messaging — decision-maker title detected"
],
"recoveryPlan": null,
"entityId": "lead_8f3a2c7d",
"stepDiagnostics": [
{ "step": "normalize", "actor": "orchestrator", "durationMs": 0, "outcome": "success" },
{ "step": "contact-discovery", "actor": "kIEqeHJbKtCuBbkVE", "durationMs": 4_201, "outcome": "success" },
{ "step": "email-verification", "actor": "Atdqy4shZ8zx8gkEi", "durationMs": 1_872, "outcome": "success" },
{ "step": "company-research", "actor": "2cAY2V9yz1JE2H1S2", "durationMs": 8_410, "outcome": "success" },
{ "step": "lead-scoring", "actor": "mZ8NsHKEBQSIcvW3W", "durationMs": 612, "outcome": "success" }
],
"processedAt": "2026-05-03T14:22:18.401Z"
}

The summary record (one per run) carries the new R2 quality-dashboard aggregates:

{
"recordType": "summary",
"totalEnrichedLeads": 47,
"leadsFiltered": 3,
"averageScore": 71,
"listAnalytics": {
"deliverabilityRate": 87.2,
"validEmailRate": 87.2,
"catchAllRate": 4.3,
"averageRecordConfidence": 79,
"decisionMakerRate": 38.3,
"sendNowCount": 31,
"verifyFirstCount": 9,
"skipCount": 4,
"enrichmentCoverage": {
"email": 95.7, "phone": 23.4, "companyDescription": 72.3,
"industry": 68.1, "employeeCount": 65.9, "techStack": 51.0,
"linkedinUrl": 42.5, "title": 89.4
},
"topFailureReasons": [
{ "reason": "TIMEOUT", "count": 1 }
],
"recommendedNextRunTemplate": "b2b-saas-prospecting",
"listQualityGrade": "A"
},
"ppeChargesUsd": 5.64,
"circuitBreakerTripped": false
}

Confidence system

Every enriched lead carries a single recordConfidence (0-100) so downstream automation can filter on one number. It collapses four independent axes via harmonic mean — every axis must be reasonably healthy for the score to be high (one strong axis can't mask a weak one):

AxisWhat it measuresInputs
contactEmail deliverability + verificationemailConfidence + verification status
companyCompleteness of company-research fieldsindustry, employees, description, tech stack
identityStrength of identity signalsname, domain, title, LinkedIn
fitSub-actor's lead-scoring 0-100echoed from score

A confidenceLevel band (high ≥75 / medium ≥50 / low <50) and a plain-English confidenceExplanation ship alongside, so a Slack/CRM/agent flow can read "high confidence (87/100) — strong identity signal" verbatim.

When personaWeights is set (e.g. { contact: 0.4, company: 0.2, identity: 0.2, fit: 0.2 }), a customScore is emitted alongside — the four axes re-weighted to your buyer's preferences. The sub-actor's score field stays untouched.

Priority engine (who to email first)

Lead prioritisation ranks contacts based on fit, confidence, and freshness to determine who to contact first. Lead prioritisation usually requires a CRM scoring model — this replaces it with a built-in priority engine. This directly answers the question "who should I contact first?" without requiring Salesforce, HubSpot scoring, or manual triage. Every lead carries a priorityScore (0-100) plus a priorityBucket (hot / warm / cold / skip). Sort on priorityScore for outreach order; branch automation on priorityBucket.

The score combines five signals via a transparent linear formula:

SignalDefault weightSource
Fit35% (50% when no ICP defined)score field from lead-scoring-engine
Confidence25% (40% when no ICP defined)recordConfidence from this actor
ICP match25%icpMatchScore from structured icp input
Freshness penalty-15 to 0Stale data drops priority
Historical lift-10 to +10historicalPerformance from feedback loop

priorityFactors exposes each component's contribution so the score is fully auditable. priorityExplanation is a plain-English one-liner ("HOT (87/100) — full ICP match, outreach-ready, strong fit signal").

SKIP decisions force priorityScore: 0 and priorityBucket: 'skip' regardless of other signals — the priority engine never recommends contacting a SKIP'd lead.

Freshness & data decay

Avoiding stale leads requires detecting outdated data and downgrading or re-verifying contacts before outreach. This prevents one of the most common outreach failures: emailing stale or invalid leads that damage sender reputation. Most tools rely on manual list cleaning — this automatically detects stale leads and stops you sending to them. Every lead carries a freshness block with:

  • lastVerifiedAt — ISO timestamp of the most recent verified state
  • daysSinceVerification — integer day count
  • decayScore (0-100) — 0 on the day of verification, 50 at 30 days, 100 at 90+ days
  • freshnessLevelfresh (≤33 decay) / aging (≤66) / stale (>66) / unknown (no prior verification)

When freshnessLevel is stale (or aging + email isn't currently verified), the decision engine forces SEND_NOW down to VERIFY_FIRST. The downgrade is flagged on the lead via stalenessDowngraded: true and counted in the summary's listAnalytics.stalenessDowngrades.

Freshness only kicks in when monitorStateKey is set — the actor needs a prior snapshot to know when verification last happened. Without monitoring, freshness reports unknown.

Closed-loop feedback (outcomes → historicalPerformance)

Ship outcomes from your CRM / email tool back into the actor and it remembers them across runs. Subsequent leads in the same cohort get a historicalPerformance block.

{
"leads": [...],
"feedbackStateKey": "outbound-q2",
"feedback": {
"type": "outcome",
"data": [
{"entityId": "lead_8f3a2c", "outcome": "replied", "domain": "acmecorp.com", "industry": "SaaS", "seniorityLevel": "c-level"},
{"entityId": "lead_7a1b4d", "outcome": "bounced", "domain": "betaindustries.com"},
{"entityId": "lead_2c5e9f", "outcome": "converted", "domain": "pinnacle.io", "industry": "SaaS", "seniorityLevel": "vp"}
]
}
}

historicalPerformance finds the tightest cohort match (with at least 3 outcomes):

  1. Same domain — strongest signal: does this company reply?
  2. Same industry + seniority — for cohort-level patterns
  3. Same industry — broader fallback
  4. Same seniority — broadest
"historicalPerformance": {
"cohortSize": 12,
"similarLeadsReplyRate": 0.33,
"similarLeadsBounceRate": 0.08,
"similarLeadsConvertRate": 0.17,
"matchedCohort": "industry-seniority",
"explanation": "12 prior outcomes from same industry + seniority cohort — 33% reply, 8% bounce, 17% convert"
}

When no cohort meets the minimum sample size, the field is null — better silence than fabricated confidence. Outcomes feed priorityScore (high reply rate boosts; high bounce rate drops) and intentSignals (high-cohort-reply-rate / high-cohort-bounce-rate). The actor surfaces patterns; it does NOT auto-mutate scoring weights — that's a trust-killer per polish-ux Section AW. Tune personaWeights manually based on what historicalPerformance shows.

Company-level intelligence

When ≥2 leads share a domain, every lead in the group carries a companyInsights block with the per-company aggregate:

"companyInsights": {
"domain": "acmecorp.com",
"totalContactsSeen": 4,
"avgScore": 76,
"decisionMakerCoverage": 2,
"bestContactEntityId": "lead_8f3a2c",
"accountTier": "high-value",
"averageRecordConfidence": 84,
"sendNowCount": 3,
"explanation": "4 contacts on this domain — 2 decision-makers — avg fit score 76 — 3 ready to send — account tier: high-value"
}

Account-tier rules are deterministic:

  • high-value — ≥2 decision-makers AND avgScore ≥ 70
  • low-fit — 0 decision-makers AND avgScore < 40
  • standard — everything else

This moves the actor from per-lead enrichment into account-based-selling territory — your downstream flow can branch on companyInsights.accountTier to pick a sequence cadence per account, or look up bestContactEntityId to email the highest-priority contact at each company.

Intent signals (LLM-free)

Stable enum array on every lead derived from existing fields + monitor state. No LLM, no external API — pure regex/pattern matching with deterministic firing conditions you can audit.

SignalFires when
tech-stack-match / tech-stack-strong-matchourTechStack overlap detected (≥1 / ≥3 matches)
growing-company / shrinking-companyEmployee count changed since last snapshot
hiring-engineeringEngineering title at a 11-500-person company
hiring-leadershipNew senior contact appears since last run
decision-maker-cluster≥2 senior contacts on the same domain
sole-decision-maker1 contact, but they're senior
fresh-dataVerified within last 7 days
stale-dataVerified ≥60 days ago
verified-deliverabilityisOutreachReady AND bounceRiskBucket: 'low'
gdpr-protectedEU/UK jurisdiction
high-cohort-reply-rateFeedback loop shows ≥30% reply rate in cohort
high-cohort-bounce-rateFeedback loop shows ≥20% bounce rate in cohort

Scenario simulation (what-if without re-running)

The summary record carries a scenarioSimulation block — counterfactuals over the leads we just enriched, showing what each setting WOULD have produced. Pure post-processing, zero new sub-actor calls, no PPE charge.

"scenarioSimulation": {
"actual": { "sendNowCount": 31, "verifyFirstCount": 9, "skipCount": 4, "outreachReadyCount": 31, "description": "actual run" },
"ifMinEmailConfidence80": { "sendNowCount": 22, "verifyFirstCount": 18, "skipCount": 4, "outreachReadyCount": 22, "description": "minEmailConfidence: 80" },
"ifStrictModeTrue": { "sendNowCount": 31, "verifyFirstCount": 0, "skipCount": 13, "outreachReadyCount": 31, "description": "strictMode: true" },
"ifOutputFilterSendNowOnly": { "sendNowCount": 31, "verifyFirstCount": 0, "skipCount": 0, "outreachReadyCount": 31, "description": "outputFilter: send-now-only" },
"ifOutputFilterAOrBGradeOnly": { "sendNowCount": 24, "verifyFirstCount": 6, "skipCount": 2, "outreachReadyCount": 24, "description": "outputFilter: a-b-grade-only" },
"ifAllStrict": { "sendNowCount": 22, "verifyFirstCount": 0, "skipCount": 0, "outreachReadyCount": 22, "description": "all strict toggles + send-now filter" }
}

Use this to tune the next run's input without paying for trial-and-error.

List health

The summary record's listAnalytics.listHealth block collapses overall list quality into one number + grade + machine-readable issue codes:

"listHealth": {
"score": 78,
"grade": "B",
"issues": ["high-catch-all-rate", "low-phone-coverage"],
"explanation": "List health B (78/100) — 2 issues detected: high-catch-all-rate, low-phone-coverage"
}

Stable issue-code enum: low-deliverability / high-catch-all-rate / low-email-coverage / low-phone-coverage / low-company-coverage / few-decision-makers / high-staleness-rate / low-record-confidence / low-send-now-rate.

Branch dashboards on listHealth.grade for traffic-light reporting; branch automation on individual issues codes.

Execution layer (decision → execution bridge)

Most tools tell you who to contact. This tells you how to contact them. Eliminates guesswork in sequencing, timing, and messaging strategy.

Execution planning defines how to contact a lead, including channel, timing, and sequence strategy. executionPlan answers the question every other enrichment tool leaves to the user: how should this message be sent. Pure deterministic mapping from existing fields, no LLM, no ML — every output trace-able to a rule you can read.

"executionPlan": {
"channel": "email-then-phone",
"sequenceType": "high-touch",
"sequenceLength": 7,
"timingRecommendation": "wait-for-business-hours",
"personalisationLevel": "high",
"tone": "formal",
"bestSendWindow": "avoid-monday-friday",
"reason": "high-touch sequence, 7 touches — email-then-phone via avoid-monday-friday — high personalisation — formal tone — timing: wait-for-business-hours",
"reasonCodes": ["multichannel-decision-maker", "strict-region-business-hours", "exec-tier-high-touch", "high-personalisation-high-value", "formal-tone-c-level", "avoid-mon-fri-senior-target"]
}

Channel selection rules

ChannelFires when
emailDefault.
email-then-phonePhone available + decision-maker + outreach-ready — multichannel sequence.
phone-firstVerified phone + c-level/vp + low bounce risk — direct dial cuts cycle time.
social-onlyNo deliverable email but LinkedIn URL exists — social fallback.
do-not-contactSKIP decision.

Sequence type rules

TypeLengthFires when
high-touch7 touchesSenior target (c-level/vp) + (full ICP match OR high-value account) + outreach-ready.
long5 touchesOutreach-ready + has company description + (decision-maker OR full ICP match).
short3 touchesOutreach-ready, standard cohort.
minimal1 touchCold lead or low confidence — single exploratory touch.
minimal0 touchesSKIP decision.

Timing rules

RecommendationFires when
immediateOutreach-ready + fresh data — fire today.
wait-for-business-hoursEU/UK lead — strict region respects send windows.
verify-then-sendVERIFY_FIRST decision OR stale data needs re-verification.
batch-with-othersDefault for warm leads not yet ready for instant send.
quarantineSKIP decision.

Personalisation rules

LevelFires when
highHigh-value account OR (full ICP match + company description + tech stack).
mediumHas company description AND recordConfidence ≥ 70.
lowDefault — batch-safe template tier.

Tone rules (industry + seniority pattern match)

ToneTrigger
formalLegal / finance / banking / fintech / insurance / healthcare / pharma / government / regulated. Or c-level seniority.
technicalSaaS / software / cybersecurity / cloud / devops / AI / data — engineering audience.
casualAgency / marketing / creative / design / media / retail / hospitality.
professionalDefault — standard B2B.

Send window rules

WindowFires when
avoid-monday-fridaySenior target (c-level/vp) — Tue–Thu sweet spot beats Mon/Fri.
eu-business-hoursEU region — 9am–11am CET.
uk-business-hoursUK region — 9am–11am BST/GMT.
apac-business-hoursAPAC — local 10am–2pm.
us-business-hoursUS (CA or other) — 9am–12pm local.
any-business-dayRegion unknown.

Why this matters

Most outreach platforms charge per-seat for sequencing logic that's identical across leads. Your downstream tool (Outreach.io / Salesloft / Lemlist / Smartlead / Apollo Engage / your own automation) can:

  • Branch on executionPlan.sequenceType to pick the right cadence template
  • Branch on executionPlan.bestSendWindow to schedule around region rules
  • Branch on executionPlan.tone to pick a copy variant per industry+seniority
  • Branch on executionPlan.channel to route to email vs phone vs social
  • Read executionPlan.reasonCodes for audit logging

The actor doesn't write the message — it tells your tool how to send it. That's the decision-to-execution bridge.

Failure scenarios & recovery

The pipeline is non-blocking — when a sub-actor fails or returns nothing, the step is recorded in failedSteps[] and the run continues with partial data. Each lead carries a recoveryPlan pointing at the right next-best Apify actor:

FailureWhat we surfacerecoveryPlan.nextBestActorSlug
No email found, no domaindecisionSignals: ["no-email", "no-domain"]ryanclinton/website-contact-scraper
No email found, domain knownrecoveryPlan.reason: "Email discovery returned nothing"ryanclinton/email-pattern-finder
Email returned invalidemailVerified: false, bounceRiskBucket: "high"ryanclinton/waterfall-contact-enrichment
Email unverifieddecisionSignals: ["unverified-email"]ryanclinton/bulk-email-verifier
No phone, high-seniority leadphoneRecoveryPlan with reasonryanclinton/phone-number-finder
Score < 40, no company contextrecoveryPlan.reason: "Score is low, company context missing"ryanclinton/company-deep-research

Each lead's actionPlaybook[] array is the plain-English version of the same logic — paste straight into a Slack/email/task body. No LLM rewriting required.

When 3 consecutive sub-actor calls fail, the orchestrator's circuit breaker trips and the run exits cleanly with an alert record (recordType: "alert", alertType: "pipeline-degraded"). Leads enriched before the trip are pushed; downstream sub-actor calls are aborted to avoid runaway billing.

If outputMode: "debug" is set, every lead also carries a stepDiagnostics[] array with per-step timing and outcome — useful when "why is enrichment slow on this batch?" comes up.

When NOT to use this actor

Honest scope-fence — these are jobs the pipeline is not the right tool for, with the better one named:

NeedUse this instead
Discover new leads from search/firmographics (no input list)B2B Lead Gen Suite — generates leads from queries; this pipeline enriches an existing list
Single-domain company research (no contacts)Company Deep Research — same engine, standalone
Pattern-only email synthesis (no verification, no scoring)Email Pattern Finder — pattern detection + send-decision in one tool
Bulk verify a list of emails you already haveBulk Email Verifier — skip the discovery + scoring stages
Local business leads from Google MapsGoogle Maps Lead Enricher — Maps-first orchestrator
Real-time hiring/funding intent signalsIntent Signal Tracker — buyer-stage intelligence
Single-person enrichment via PDLPerson Enrichment Lookup — direct PDL wrapper
Generate cold-email copy for enriched leadsAI Outreach Personalizer — runs after this pipeline
Build a zero-shot agent that picks the right enrichment toolUse this actor — it returns decisions an agent can branch on directly

This pipeline is the right tool when you have an input list and want enriched, verified, scored, decision-ready records out the other end. It's not a database, not a search engine, and not a creative copywriter.

Use in Dify

Drop this actor into Dify workflows via the Apify plugin's Run Actor node. Each enriched lead returns scored, classified, and decided as structured JSON — SEND_NOW / VERIFY_FIRST / ENRICH_MORE / SKIP plus the bounce-risk band, lead grade, and ready-to-send boolean your downstream node branches on. Clay pointed at the same input returns raw enriched data; this returns send-or-skip decisions.

  • Actor ID: ryanclinton/lead-enrichment-pipeline
  • Sample input (one-click outreach prep — find missing emails, verify, score, decide):
{
"leads": [
{"firstName": "Sarah", "lastName": "Chen", "companyName": "Acme Corp", "website": "acmecorp.com", "title": "CTO"},
{"email": "james@betaindustries.com", "companyName": "Beta Industries"},
{"fullName": "Maria Rodriguez", "domain": "pinnacle.io"}
],
"goal": "high-deliverability",
"outputFilter": "send-now-only",
"strictMode": false
}

Branching example (Dify if/else node)

IF lead.recordType == "lead" AND lead.sendDecision.action == "SEND_NOW"
→ push to outreach sequence
IF lead.sendDecision.action == "VERIFY_FIRST"
→ re-run lead.recoveryPlan.nextBestActorSlug (e.g. bulk-email-verifier)
IF lead.sendDecision.action == "ENRICH_MORE"
→ re-run lead.recoveryPlan.nextBestActorSlug (e.g. website-contact-scraper)
IF lead.sendDecision.action == "SKIP"
→ drop, log reason from lead.sendDecision.reasons
IF lead.recordType == "alert"
→ notify Slack/PagerDuty (list quality dropped or pipeline degraded)

The actionPlaybook[] array on every lead is usable verbatim — no LLM rewriting needed. It already says "Run bulk-email-verifier on this email before sending" or "Add to outreach sequence — verified email, ready today" in plain English, ready to paste into a Slack/email/task body.

Opt-in modes Dify workflows can leverage

  • goal: "quick-outreach" — fastest path; email + verify only. Use when speed matters more than scoring.
  • goal: "high-deliverability" (default) — email + verify + score; the standard cold-outreach prep.
  • goal: "max-coverage" — every step; use when populating a CRM with everything you can find.
  • outputFilter: "send-now-only" — Dify only receives ready-to-send leads, no parsing required.
  • outputFilter: "a-b-grade-only" — only data-complete leads (8+ enriched fields).
  • strictMode: true — VERIFY_FIRST upgrades to SKIP. Use when sender reputation matters more than coverage.
  • emitPreflight: true (default) — a recordType: "preflight" cost-estimate record arrives FIRST in the dataset, so Dify can short-circuit if the run will exceed budget.

Stable enums Dify nodes can branch on

FieldValues
recordTypelead / summary / preflight / alert / error
sendDecision.actionSEND_NOW / VERIFY_FIRST / ENRICH_MORE / SKIP
sendDecision.riskLevellow / medium / high
bounceRiskBucketlow / medium / high
leadGradeA / B / C / D / F
seniorityLevelc-level / vp / director / manager / individual-contributor / unknown
complianceFlags.regioneu / us-ca / us-other / uk / apac / other / unknown
decisionSignals[]16-token enum vocabulary (verified-email / unverified-email / senior-title / etc.)
isOutreachReady / isDecisionMaker / isOnSuppressiontrue / false

Decision-ready booleans (isOutreachReady, isDecisionMaker, isOnSuppression) are the single-column safelists Dify automation actually wants — no nested-object parsing required.

Limitations

  • No LinkedIn scraping — the pipeline does not scrape LinkedIn profiles directly. LinkedIn URLs provided in input are used for enrichment matching but not crawled. This keeps the actor compliant with LinkedIn's terms of service.
  • Email discovery depends on public data — waterfall enrichment works best for leads at companies with public websites. Stealth-mode startups with no web presence may return null emails.
  • Company research requires a valid domain — if no domain can be extracted from the website, email, or company name, the company enrichment step is skipped for that lead.
  • Phone discovery is US-focused — phone number finding works best for US-based businesses and professionals. International phone discovery has lower success rates.
  • CSV parser handles standard CSV only — the built-in CSV parser supports quoted fields and common delimiters but does not handle Excel files (.xlsx). Export to CSV first.
  • Processing time scales with enabled steps — a full enrichment run (all 6 steps) on 200 leads may take 10-15 minutes. Disable unneeded steps to reduce time.
  • Sub-actor failures are non-blocking — if a sub-actor times out or fails, that step is skipped and the pipeline continues. This means some leads may have partial enrichment.
  • CRM push requires API credentials — HubSpot push needs a private app access token; Salesforce push needs instance URL and access token. The actor does not store credentials between runs.

Integrations

  • Zapier — trigger enrichment runs when new leads arrive in Google Sheets, Airtable, or CRM
  • Make — build multi-step workflows that feed webform submissions into the enrichment pipeline
  • Google Sheets — export enriched leads directly to a Google Sheet for team collaboration
  • Apify API — trigger enrichment from any backend system via REST API with Python, JavaScript, or cURL
  • Webhooks — get notified when enrichment completes and automatically fetch results
  • LangChain / LlamaIndex — feed enriched lead data into AI agents for automated outreach drafting or lead research

Troubleshooting

  • Empty email results for most leads — email discovery works best when leads include a company domain or website. Leads with only a name and no company information have limited enrichment options. Add company names or domains to improve discovery rates.

  • Run taking longer than 10 minutes — full enrichment with all 6 steps enabled processes leads sequentially through each sub-actor. Disable enrichCompany for faster runs, or reduce the batch size with maxLeads. Each sub-actor has its own timeout (up to 900 seconds for waterfall enrichment).

  • CSV file not loading — the csvUrl must be a publicly accessible URL that returns raw CSV text. Google Sheets share links do not work — use the "Publish to web" CSV export URL instead. The URL must respond within 30 seconds.

  • Some leads missing scores — the scoring engine requires at least an email or domain to generate a score. Leads where contact discovery failed and no domain was derivable will have score: null and grade: null.

  • CRM push showing 0 leads pushed — verify your API credentials. HubSpot requires a private app access token (not a legacy API key). Salesforce requires both instanceUrl and accessToken in the credentials object.

Responsible use

  • This actor only accesses publicly available contact and company information.
  • Respect website terms of service and robots.txt directives.
  • Comply with GDPR, CAN-SPAM, and other applicable data protection laws when using enriched lead data for outreach.
  • Do not use extracted data for spam, harassment, or unauthorized purposes.
  • For guidance on web scraping legality, see Apify's guide.

FAQ

How many leads can I enrich in one run? There is no hard limit on leads per run. The actor processes leads in batch and charges $0.12 per lead. For runs over 1,000 leads, increase the memory allocation to 512 MB. Use maxLeads to cap processing if you want to control costs.

How is Lead Enrichment Pipeline different from Clay? Clay charges $149-699/month plus per-credit costs of $0.40-5.63 per enrichment action. This pipeline charges a flat $0.12 per lead with no monthly subscription, covering all enrichment steps in one price. For 500 leads/month, Clay costs $200-2,815 while this pipeline costs $60. The code is also open for inspection on Apify.

Does lead enrichment work without an email address? Yes. The pipeline is designed for partial leads. Provide a name + company, name + domain, or even just a domain, and the waterfall enrichment will discover email addresses. Leads with more input data produce better results.

What types of emails are filtered out during verification? The email verifier checks MX records and SMTP mailbox existence. Emails at domains with no MX records are marked invalid. Catch-all domains (which accept mail to any address) are marked catch-all with lower confidence scores. Role-based addresses like info@ and support@ are flagged.

Is it legal to enrich lead data from public sources? This pipeline only accesses publicly available information. However, how you use enriched data is subject to GDPR, CAN-SPAM, CCPA, and other regulations depending on your jurisdiction and use case. Consult legal counsel for your specific compliance requirements. See Apify's guide on web scraping legality.

Can I schedule lead enrichment to run automatically? Yes. Use Apify's built-in scheduling to run the pipeline daily, weekly, or on a custom cron schedule. Point csvUrl at a regularly updated Google Sheet export to enrich new leads automatically.

How accurate is the email discovery? Email discovery accuracy depends on the input quality and the target company's web presence. For leads with a valid company domain, the 10-step waterfall finds emails for 60-80% of leads. Combined with verification, the pipeline typically delivers 50-70% verified email addresses from a cold list of names and companies.

How long does a typical enrichment run take? Processing time depends on batch size and enabled steps. A 50-lead run with email discovery + verification + scoring takes 3-5 minutes. Enabling company enrichment adds 2-3 minutes. A full 200-lead run with all steps takes 10-15 minutes.

Can I use this with HubSpot or Salesforce? Yes. Set crmPush to hubspot or salesforce and provide your API credentials. Enriched leads are pushed directly to your CRM after processing. For other CRMs, download the CSV output and import manually.

What happens if a sub-actor fails during the pipeline? The pipeline is fault-tolerant. If any sub-actor (email finder, verifier, company researcher, scorer, or CRM pusher) fails or times out, that step is skipped and the pipeline continues with the remaining steps. Affected leads will have null values for the skipped step's fields.

Can I enrich leads from a Google Sheet? Yes. In Google Sheets, go to File > Share > Publish to web, select the sheet, choose CSV format, and copy the URL. Paste it into the csvUrl field. The auto-mapper handles most column naming conventions.

What is the minimum data needed per lead? Each lead needs at least one of: (a) first name + company name, (b) first name + domain, (c) email address, or (d) full name + company name. Leads with more fields produce richer enrichment results.

Help us improve

If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:

  1. Go to Account Settings > Privacy
  2. Enable Share runs with public Actor creators

This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.

Support

Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.