Pricing

from $120.00 / 1,000 lead enricheds

Lead Enrichment Pipeline — 5-47x Cheaper Than Clay

All-in-one lead enrichment: email discovery, phone finding, verification, company research, and lead scoring in one run. CSV or JSON in, scored leads out. $0.12/lead — 5-47x cheaper than Clay.

Pricing

from $120.00 / 1,000 lead enricheds

Rating

1.0

(1)

Developer

Ryan Clinton

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Lead Enrichment Pipeline

Deterministic outbound intelligence engine — not just an enrichment tool. Every decision is fully auditable and avoids black-box behaviour. One actor, full outbound brain, $0.12 per lead.

This actor converts raw signals into deterministic, automation-ready outbound decisions.

Apify GTM Pipeline: Scrape → Enrich → Verify → Score → Research → Push to CRM Role of this actor: Orchestration + enrichment layer (the whole pipeline).

This replaces multiple outbound tools with one system. Three layers in one actor: enrichment → decision → execution — covering what most teams stitch together from data enrichment (Clay / Apollo), scoring (Salesforce Einstein / 6sense), and sequencing logic (Outreach / Salesloft).

Every lead comes back with a clear answer — send, skip, or fix first — plus a priority score (who to email first), a freshness assessment (is this contact still good?), and an execution plan (channel, sequence length, timing, tone). The full decision enum is SEND_NOW / VERIFY_FIRST / ENRICH_MORE / SKIP. No LLM, no hidden weights.

What makes this different (read this first)

Send-or-skip decision per lead — not just data, an action
Priority engine — priorityScore (0-100) + priorityBucket (hot / warm / cold) tells you who to email first
Freshness-aware — stale data forces SEND_NOW down to VERIFY_FIRST so you don't burn sender reputation
Closed-loop feedback — ship outcomes back, the actor remembers them and surfaces cohort patterns on future leads
Execution plan — channel × sequenceType × timing × tone × bestSendWindow per lead, ready for Outreach / Salesloft / Lemlist / your own tool
Account-level intelligence — companyInsights aggregates by domain so account-based selling works without extra tooling
Built for automation — every output field is a stable enum; downstream tools branch on codes, never parse prose

Most tools tell you who to contact. This tells you how to contact them.

Every score is auditable down to the rule path. The execution plan is generated deterministically — same input, same output, every run. No manual interpretation required between this actor and your sequencing tool.

TL;DR — if you just want results

Upload a CSV (or paste a JSON array of leads)
Click Start
Filter the dataset for sendDecision.action = "SEND_NOW"
Pass each lead's executionPlan into your sequencing tool

That's the simple path. Everything below — templates, monitoring, feedback loop, persona scoring, scenario simulation — is opt-in for power users.

What is outbound lead automation?

Outbound lead automation is the process of enriching, scoring, prioritising, and sequencing leads for outreach. Tools that both enrich leads and generate outreach sequences combine contact discovery with built-in execution planning, replacing a stack of separate enrichment, scoring, and sequencing tools with a single pipeline.

Why you can trust the output

Automating outbound without AI hallucinations requires deterministic, rule-based decision systems. The actor never calls an LLM during enrichment, scoring, prioritisation, or execution planning — every score is a documented formula, every decision is reproducible, and trust is prioritised over adaptive automation. The actor surfaces patterns and lets the user decide rather than auto-mutating scoring weights.

No LLM scoring. Every score is a documented formula over input fields you can see.
No hidden weights. Every decision exposes its decisionRulePath, confidenceBreakdown, priorityFactors, and reasonCodes.
Stable enums for automation. All decision values come from documented enum tables — your downstream tools branch on codes, never parse prose.
Honest abstention. When data is thin (cohort < 3 outcomes; no monitor history; no ICP defined), the actor returns null instead of fabricating confidence.
No auto-tuning. The feedback loop surfaces cohort patterns; users adjust personaWeights themselves. Auto-mutating weights would be opaque and trust-killing.
Fully auditable. Every record carries confidenceExplanation, priorityExplanation, and executionPlan.reason in plain English so you can see why a decision was made.

Built for automation & AI agents

Deterministic outputs — no hallucination risk, every run with the same input produces the same output.
Stable enums for branching: sendDecision.action, priorityBucket, executionPlan.channel, bounceRiskBucket, leadGrade, intentSignals[]. See Stable enums (quick reference).
No LLM dependency. This actor never calls an LLM. Your downstream agent (Dify / LangChain / a custom OpenAI tool) reads the structured output and acts on it.
Single-column safelists. isOutreachReady, isDecisionMaker, isFullIcpMatch, isOnSuppression boolean fields for one-tick spreadsheet/SQL filters.
Execution-ready. executionPlan ships sequencing logic so your agent doesn't have to invent it.

See Use in Dify for a full agent integration walkthrough.

How this compares to the alternatives

Capability	Clay	Apollo / ZoomInfo	Hunter / Clearbit	Lead Enrichment Pipeline
Email + company enrichment	yes	yes	yes	yes
Email verification	paid add-on	stale	basic	MX + SMTP
Lead scoring	opaque	opaque	no	transparent + auditable
Send-or-skip decision per lead	no	no	no	stable enum
Priority ranking with explanation	no	score only	no	`priorityScore` + `priorityFactors`
Freshness / staleness awareness	no	no	no	`freshness` + decay model
Closed-loop feedback (no auto-tuning)	no	no	no	`historicalPerformance`
Account-level aggregation	via tables	yes	no	`companyInsights`
Execution planning (channel + sequence + timing + tone)	no	no	no	`executionPlan`
Cross-run change detection	no	no	no	`changeFlags[]`
Determinism (same input → same output)	no	no	yes	yes
LLM-free (fully auditable)	no	no	yes	yes
Per-credit / per-event billing	yes	subscription	subscription	pay-per-event

Unlike Clay, this system produces a deterministic send decision you can branch on. Unlike Apollo or ZoomInfo, it accounts for data freshness and won't recommend contacting a stale lead. Unlike traditional enrichment tools, it includes execution planning — sequencing, timing, tone, and channel — out of the box.

How the pipeline works

The actor orchestrates 7 specialised sub-actors in sequence: contact discovery via 10-step waterfall enrichment, phone number finding, MX + SMTP email verification, deep company research from 7+ sources, multi-signal lead scoring with A-F grades, and optional CRM push to HubSpot or Salesforce. Each step runs only when the lead needs it — no wasted credits on data you already have.

INPUT (CSV or JSON)
    │  Each row: name+company, name+domain, or email
    ▼
┌─────────────────────────────────────────────────────────────────────┐
│ Step 1 — Normalize          Domain extraction, name parsing, dedup  │
│ Step 2 — Contact Discovery  Waterfall (pattern → PDL → scrape …)   │
│ Step 3 — Email Verify       MX + SMTP, confidence score             │
│ Step 4 — Company Enrich     7+ sources (web, GitHub, SEC, Wiki …)  │
│ Step 5 — Lead Score         5-category 0-100 + A-F grade            │
│ Step 5b — Decision Engine   sendDecision + recoveryPlan + signals   │
│ Step 6 — CRM Push           HubSpot / Salesforce / dataset only     │
└─────────────────────────────────────────────────────────────────────┘
    │
    ▼
OUTPUT — every lead carries:
    • SEND_NOW / VERIFY_FIRST / ENRICH_MORE / SKIP
    • bounceRiskBucket  (low / medium / high)
    • leadGrade         (A-F data-completeness)
    • recordConfidence  (0-100, harmonic mean across 4 axes)
    • actionPlaybook[]  (next steps in plain English)
    • recoveryPlan      (next-best Apify actor when stuck)

What data can you extract?

Data Point	Source	Example
📧 Email address	Waterfall: website scraping, pattern detection, PDL	`sarah.chen@acmecorp.com`
📧 Email verified	MX + SMTP deliverability check	`true`
📧 Email confidence	Verification engine score	`95`
📞 Phone number	Website scraping, directories, PDL	`+1-415-555-0142`
👤 Full name	Input normalization, name parsing	`Sarah Chen`
🏢 Company name	Input or company research	`Acme Corp`
🌐 Domain	Extracted from website, email, or company name	`acmecorp.com`
🏭 Industry	Company deep research	`Technology`
👥 Employee count	Company deep research	`51-200`
🔧 Tech stack	Company deep research	`React, Node.js, AWS`
🔗 Social profiles	Company research (LinkedIn, Twitter)	`linkedin.com/company/acmecorp`
⭐ Lead score	Multi-signal scoring engine (0-100)	`82`
🏅 Lead grade	Score-derived letter grade	`A`
📊 Score breakdown	Per-category scoring: digital, engagement, company, contact, authority	`{"digital": 18, "company": 20, ...}`

Why use Lead Enrichment Pipeline?

The capability gap matters more than the cost gap. Other tools give you data; this one gives you a deterministic decision system you can drop into automation, audit end-to-end, and trust without LLM-flavoured "fuzziness." See How this compares to the alternatives for the side-by-side.

Operational benefits the platform gives you out of the box:

Scheduling — run daily, weekly, or custom intervals to enrich new leads automatically
API access — trigger enrichment runs from Python, JavaScript, or any HTTP client
Proxy rotation — sub-actors use Apify's built-in proxy infrastructure for reliable scraping
Monitoring — get Slack or email alerts when enrichment runs fail or produce unexpected results
Integrations — connect to Zapier, Make, Google Sheets, HubSpot, Salesforce, or webhooks
Pay-per-event billing — $0.12 per lead enriched, no monthly subscription, no surprise charges

Features

6-step enrichment pipeline — Normalize, Contact Discovery, Email Verify, Company Enrich, Lead Score, and CRM Push run in sequence on every lead
Smart step skipping — leads with existing emails skip contact discovery; leads without domains skip company research; disabled steps are bypassed entirely
10-step waterfall email discovery — website scraping, email pattern detection, People Data Labs enrichment, SMTP probing, and social profile matching in a single cascade
15 email pattern candidates per name — generated from 15 B2B naming conventions (first.last, firstlast, flast, f.last, first_last, first-last, firstl, last.first, plus 7 more) ranked by industry prevalence so the most-likely pattern is tried first
International name transliteration — 40+ accented characters (umlauts, diacritics, cedillas, eszett) auto-converted to ASCII before pattern generation, so François Müller and Jiří Řehák generate clean candidate patterns instead of broken ones
MX + SMTP email verification — every discovered email is verified for deliverability with confidence scores. SMTP probing performs an EHLO → MAIL FROM → RCPT TO handshake and disconnects before DATA — no email is ever sent during verification
Catch-all domain detection — domains accepting all addresses are flagged automatically (a random nonexistent address is tested as a control); confidence is capped on catch-all domains so SDR teams can quarantine high-bounce-risk leads before sending
Phone number extraction from company websites — phone numbers scraped from contact, about, and team pages during waterfall enrichment, with phoneSource attribution showing where each came from
Social profile discovery — LinkedIn URLs found on websites or generated as fallback; the suite then layers in Twitter/Facebook/GitHub URLs from company-deep-research
Deep company research — pulls from 7+ sources (website, Wikipedia, GitHub, SEC filings, academic databases, DNS, social profiles) to build company intelligence
Multi-signal lead scoring — scores leads 0-100 across 5 categories: digital presence, engagement signals, company fit, contact completeness, and authority level
CSV and JSON input — upload a CSV URL or paste a JSON array; CSV headers are auto-mapped from 40+ common column name variations
CSV output — downloadable CSV file generated in the Key-Value Store alongside the standard JSON dataset
Batch processing with per-domain caching — all leads needing a step are sent in one sub-actor call, not one call per lead; the waterfall caches website scraping and pattern detection per domain, so 50 leads at one company cost the same sub-actor compute as 1
Direct CRM push — enriched leads push straight to HubSpot or Salesforce with no middleware required
Source tracking — every discovered field includes a *Source tag (e.g., emailSource: "pattern-detection") so you know where data came from
Spending limit — set a maximum budget per run; the pipeline stops when your limit is reached
Pass-through fields — extra CSV columns not in the standard schema are preserved on output

Use cases for lead enrichment

Sales prospecting

SDRs and BDRs export prospect lists from LinkedIn Sales Navigator or Apollo with names and companies but no verified emails. This pipeline fills in the gaps: discovering work emails via waterfall enrichment, verifying deliverability, and scoring each lead so reps focus on the highest-value targets first.

Marketing agency lead generation

Agencies build prospect databases for clients across industries. Instead of paying for a typically subscription-priced platform per client, agencies run this pipeline at $0.12/lead to enrich downloaded attendee lists, webinar signups, or trade show scans with emails, phone numbers, and company data.

Recruiting and talent sourcing

Recruiters have candidate names and companies from job boards but need direct contact information. The pipeline discovers work emails and phone numbers, then scores candidates by company fit and seniority signals to prioritize outreach.

CRM data enrichment

Sales ops teams maintain CRM databases where 30-60% of contact records are incomplete or stale. Upload the CRM export as CSV, enrich missing fields, verify existing emails, and push updated records back to HubSpot or Salesforce — all in one run.

Competitive intelligence

Market research teams tracking competitor employees need enriched profiles with verified contact data, company intelligence, and tech stack information. The pipeline enriches partial competitor employee lists into actionable intelligence reports.

Event lead processing

After conferences and trade shows, teams have badge scan exports with names and companies but no email addresses. This pipeline converts raw event leads into outreach-ready contacts with verified emails, company context, and priority scores within minutes of the event ending.

How to enrich leads with this pipeline

Upload your leads — Paste a JSON array of lead objects in the input field, or provide a public URL to a CSV file. Each lead needs at minimum a name + company, a name + domain, or an email address.
Choose enrichment steps — Enable or disable email discovery, phone finding, email verification, company research, and lead scoring based on what you need. Defaults cover the most common workflow (email + verify + score).
Run the pipeline — Click "Start" and wait. A batch of 50 leads with default settings typically completes in 3-5 minutes. Status messages update at each pipeline step.
Download results — Get enriched leads as JSON from the Dataset tab, or download the CSV file from the Key-Value Store link in the summary record. Push directly to HubSpot or Salesforce by enabling CRM push.

Input parameters

Core

Parameter	Type	Default	Description
`leads`	array	—	JSON array of lead objects. Each can have: firstName, lastName, fullName, email, phone, companyName, domain, website, title, linkedinUrl.
`csvUrl`	string	—	Public URL to a CSV file with lead data. Headers auto-mapped. Takes precedence over JSON leads.
`maxLeads`	integer	`0`	Maximum leads to process. Set to 0 for unlimited.
`outputCsv`	boolean	`true`	Generate a downloadable CSV file in the Key-Value Store.

Pipeline configuration

Parameter	Type	Default	Description
`template`	string	`"custom"`	High-level preset (see Templates section below). One of: `custom` / `b2b-saas-prospecting` / `enterprise-sales` / `recruiting-tech` / `recruiting-non-tech` / `event-leads` / `agency-outbound` / `crm-cleanup`. Wins over individual settings only when those settings are unset.
`goal`	string	`"high-deliverability"`	Goal preset that maps to step toggles. One of: `quick-outreach` / `high-deliverability` / `max-coverage` / `custom`. Overridden when a template is set.
`enrichEmail`	boolean	`true`	(custom mode) Run waterfall email discovery for leads missing email addresses.
`enrichPhone`	boolean	`false`	(custom mode) Run phone number discovery for leads missing phone numbers.
`verifyEmails`	boolean	`true`	(custom mode) Run MX + SMTP verification on all emails.
`enrichCompany`	boolean	`false`	(custom mode) Run deep company research for leads with a domain.
`scoreLeads`	boolean	`true`	(custom mode) Score all leads 0-100 and assign A-F grades.

Decision tuning (v1.1+)

Parameter	Type	Default	Description
`icp`	object	`{}`	Structured ICP — `{ roles, seniority, industries }`. Each lead gets `isIcpRoleMatch` / `isIcpSeniorityMatch` / `isIcpIndustryMatch` + `isFullIcpMatch` + `icpMatchScore` (0-100). Preferred over `icpRoles`.
`icpRoles`	array	`[]`	Deprecated alias of `icp.roles`. Kept for back-compat.
`ourTechStack`	array	`[]`	Optional list of technologies that describe your product. When provided, each lead gets a `techStackMatch` score showing overlap with the company's detected tech stack. Requires `enrichCompany=true`.
`outputFilter`	string	`"none"`	Filter records BEFORE pushData and BEFORE PPE charging. Skipped leads aren't billed. One of: `none` / `send-now-only` / `verified-emails-only` / `a-b-grade-only`.
`outputMode`	string	`"analytics"`	Detail level on each emitted lead. `crm` = lean CRM-import row; `analytics` = full + decisions + confidence (default); `debug` = analytics + per-step diagnostics.
`strictMode`	boolean	`false`	Upgrade `VERIFY_FIRST` decisions to `SKIP`. Use when sender reputation matters more than coverage.
`minEmailConfidence`	integer	`0`	Drop emails below this confidence threshold (0-100) before computing the send decision. Set to 0 to keep everything.
`dedupeWithinRun`	boolean	`true`	Collapse duplicate input rows by (email \| name+domain \| name+company) before processing. Each output lead carries `mergedFromCount` showing how many input rows produced it.
`personaWeights`	object	`{}`	Optional weight pack `{ contact, company, identity, fit }` that re-weights `confidenceBreakdown` into a `customScore`. Sub-actor's `score` field stays untouched.
`suppressionListUrl`	string	—	Optional public URL to a CSV with one email or domain per line. Leads matching the list are flagged `isOnSuppression=true` and forced to `SKIP`.
`monitorStateKey`	string	—	When set, persists per-lead snapshots to a named KV store. Subsequent runs emit `changeSinceLastRun` + stable `changeFlags[]` enum. See Monitoring section below.
`feedbackStateKey`	string	—	V3 — named KV store key for outcome history. When set, persists outcomes from the `feedback` input across runs and surfaces `historicalPerformance` on similar future leads.
`feedback`	object	`{}`	V3 — closed-loop outcome ingestion. `{ "type": "outcome", "data": [{ entityId, outcome, domain, ... }] }`. Outcomes persist to `feedbackStateKey`.
`emitPreflight`	boolean	`true`	Push a `preflight` cost-estimate record at the start of the run.

CRM push

Parameter	Type	Default	Description
`crmPush`	string	`"none"`	One of: `none` / `hubspot` / `salesforce`.
`hubspotApiKey`	string (secret)	—	HubSpot private app access token. Required when `crmPush=hubspot`.
`salesforceCredentials`	string (secret)	—	JSON string `{"instanceUrl":"...","accessToken":"..."}`. Required when `crmPush=salesforce`.

Input examples

Enrich a list of prospects with email and scoring (most common):

{
    "leads": [
        {"firstName": "Sarah", "lastName": "Chen", "companyName": "Acme Corp", "website": "acmecorp.com", "title": "CTO"},
        {"firstName": "James", "lastName": "Park", "companyName": "Beta Industries", "title": "VP Sales"},
        {"fullName": "Maria Rodriguez", "domain": "pinnacle.io"}
    ],
    "enrichEmail": true,
    "verifyEmails": true,
    "scoreLeads": true
}

Full enrichment with company research and HubSpot push:

{
    "csvUrl": "https://docs.google.com/spreadsheets/d/abc123/export?format=csv",
    "enrichEmail": true,
    "enrichPhone": true,
    "verifyEmails": true,
    "enrichCompany": true,
    "scoreLeads": true,
    "crmPush": "hubspot",
    "hubspotApiKey": "pat-na1-abc123...",
    "maxLeads": 100
}

Quick email-only enrichment (fastest, cheapest):

{
    "leads": [
        {"email": "james@betaindustries.com"},
        {"email": "m.rodriguez@pinnacle.io"},
        {"email": "chen.sarah@acmecorp.com"}
    ],
    "enrichEmail": false,
    "verifyEmails": true,
    "enrichCompany": false,
    "scoreLeads": false
}

Input tips

Start with defaults — the default settings (email discovery + verification + scoring) cover the most common enrichment workflow at the lowest cost per lead
Enable company enrichment selectively — company research adds industry, employee count, and tech stack but increases processing time; enable it when company intelligence matters for your use case
Use CSV for large batches — upload your spreadsheet to Google Sheets, publish as CSV, and paste the URL; the auto-mapper handles 40+ common header variations including "First Name", "first_name", "fname", and more
Set maxLeads for testing — use maxLeads: 5 on your first run to verify the output format before processing your full list
Batch in one run — processing 200 leads in one run is faster and cheaper than running 200 single-lead runs because sub-actors are called in batch

Output example

{
    "firstName": "Sarah",
    "lastName": "Chen",
    "fullName": "Sarah Chen",
    "email": "sarah.chen@acmecorp.com",
    "emailVerified": true,
    "emailStatus": "valid",
    "emailConfidence": 95,
    "emailSource": "pattern-detection",
    "phone": "+1-415-555-0142",
    "phoneSource": "website-scraping",
    "companyName": "Acme Corp",
    "domain": "acmecorp.com",
    "website": "https://acmecorp.com",
    "title": "CTO",
    "linkedinUrl": "https://linkedin.com/in/sarahchen",
    "industry": "Technology",
    "employeeCount": "51-200",
    "companyDescription": "Enterprise SaaS platform for supply chain optimization, serving mid-market manufacturers across North America.",
    "techStack": ["React", "Node.js", "AWS", "PostgreSQL"],
    "socialProfiles": {
        "linkedin": "https://linkedin.com/company/acmecorp",
        "twitter": "https://twitter.com/acmecorp"
    },
    "score": 82,
    "grade": "A",
    "scoreBreakdown": {
        "digital": 18,
        "engagement": 15,
        "company": 20,
        "contact": 14,
        "authority": 15
    },
    "enrichmentSteps": ["contact-discovery", "email-verification", "company-research", "lead-scoring"],
    "crmPushed": false,
    "processedAt": "2026-03-24T14:30:00.000Z"
}

The final record in each run is a pipeline summary with aggregate statistics:

{
    "type": "summary",
    "totalInputLeads": 50,
    "totalEnrichedLeads": 50,
    "emailsFound": 38,
    "emailsVerified": 47,
    "companiesResearched": 45,
    "leadsScored": 50,
    "leadsPushedToCrm": 0,
    "pipelineSteps": ["normalize", "contact-discovery", "email-verification", "company-research", "lead-scoring"],
    "averageScore": 64,
    "csvDownloadUrl": "https://api.apify.com/v2/key-value-stores/abc123/records/enriched-leads.csv",
    "durationSeconds": 187,
    "completedAt": "2026-03-24T14:33:07.000Z"
}

Output fields

Each enriched lead carries the fields below. The dataset also emits recordType: 'preflight', 'summary', 'alert', and 'error' records — see the Stable enums section.

Identity & contact

Field	Type	Description
`recordType`	string enum	`'lead'` for enriched-lead records (also used for `summary` / `preflight` / `alert` / `error`).
`firstName` / `lastName` / `fullName`	string \| null	Parsed/normalized from input.
`email`	string \| null	Discovered or input email address.
`emailVerified`	boolean \| null	Whether email passed MX + SMTP verification.
`emailStatus`	string \| null	`valid` / `invalid` / `catch-all` / `unknown` / `risky`.
`emailConfidence`	integer \| null	0-100 confidence on the email address.
`emailSource`	string \| null	`input` / `pattern-detection` / `website-scraping` / `pdl` / `waterfall`.
`emailDecision`	string \| null	From bulk-email-verifier: `send` / `send-monitor` / `hold` / `verify-later` / `replace` / `suppress`. The email-level routing scalar.
`emailRecommendedAction`	object \| null	From bulk-email-verifier: `{ actionId, label, owner, eta, riskTier, reason, targetActorSlug? }`.
`emailFailureAnalysis`	object \| null	From bulk-email-verifier: per-record root-cause attribution when verification fails.
`emailDeliverabilitySimulation`	object \| null	From bulk-email-verifier: deterministic simulation of bounce / inbox / spam outcome bands.
`emailDomainInsights`	object \| null	From bulk-email-verifier: per-domain SPF/DKIM/DMARC posture + catch-all status + role-address ratio.
`emailDelta`	object \| null	From bulk-email-verifier: cross-run change block when `monitorStateKey` is set (new / changed / unchanged / recovered / degraded).
`emailStrategy`	object \| null	From bulk-email-verifier: multi-day playbook sequence — what to do tomorrow if today's send blocks.
`emailDecisionSnapshot`	object \| null	From bulk-email-verifier: replayable audit bundle (inputsHash + rulesApplied + profileSnapshot). Reproduce a decision deterministically.
`emailSignalIndependence`	object \| null	From bulk-email-verifier: `{ score, distinctSourceCount, totalComponentCount, interpretation, warning? }`. Aligned with `contactSignalIndependence`, `phoneSignalIndependence`, and `companySignalIndependence`.
`emailCounterfactual`	object \| null	From bulk-email-verifier: drops the highest-weight confidence component and recomputes — tells you whether the send-decision is load-bearing on a single signal.
`emailDecisionMemory`	object \| null	From bulk-email-verifier: outcome inference when `lastAction` was passed. `{ outcome, daysSinceAction, confidence }`. Only confidence-score movement is observable — direct send / reply / bounce outcomes are not.
`emailEntityId`	string \| null	Cross-suite join key from bulk-email-verifier — same value as the suite-level `entityId` for joining back to the standalone verifier dataset.
`phone`	string \| null	Discovered or input phone number.
`phoneSource`	string \| null	`input` / `website-scraping` / `phone-finder`.
`phoneDecision`	string \| null	From phone-number-finder: `call-now` / `call-later` / `enrich-first` / `skip`. The phone-level routing scalar.
`phoneIsContactable`	boolean \| null	From phone-number-finder: at least one usable phone number exists.
`phoneIsCallable`	boolean \| null	From phone-number-finder: a phone is contactable AND not gated by compliance / risk.
`phoneReachability`	object \| null	From phone-number-finder: deterministic reachability tier (`direct-line` / `gatekeeper` / `cold-cell` / `unreachable`) plus call-success probability.
`phoneCallOutcomePrediction`	object \| null	From phone-number-finder: deterministic prediction of `connected` / `voicemail` / `no-answer` / `gatekeeper-block` for the top-ranked number.
`phoneSlaTier`	string \| null	From phone-number-finder: `P1` / `P2` / `P3` / `P4` urgency tier driving the call queue ordering.
`phoneSignalIndependence`	object \| null	From phone-number-finder: `{ score, distinctSourceCount, totalComponentCount, interpretation, warning? }`. Aligned with `contactSignalIndependence` and `companySignalIndependence`.
`phoneCounterfactual`	object \| null	From phone-number-finder: drops the highest-weight confidence component and recomputes — tells you whether the call decision is load-bearing on a single signal.
`phoneDecisionMemory`	object \| null	From phone-number-finder: outcome inference when `lastAction` was passed. `{ outcome, daysSinceAction, confidence }`.
`phoneEntityId`	string \| null	Cross-suite join key from phone-number-finder — same value as the suite-level `entityId` for joining back to the standalone phone-finder dataset.
`contactCandidates`	object[] \| null	Top 10 ranked email candidates from waterfall enrichment with confidence + source attribution: `[{ email, pattern, confidence, sources: ['pattern_generation' \| 'website' \| 'pattern_detection' \| 'smtp'] }]`. Lets your downstream tool pick a different candidate when the primary scores low.
`contactSignalIndependence`	object \| null	From waterfall: `{ score: 0–1, distinctSourceCount, totalComponentCount, interpretation, warning? }`. Catches the "looks like 4 corroborating signals but really 1 echoed 4 times" trap on the discovered email.
`contactDecisionRisk`	object \| null	From waterfall: `{ falsePositiveCost, falseNegativeCost, reversibility, asymmetry, actEvenIfUnsure, explanation }`. FP cost = sender-rep damage from acting on a wrong email; FN cost = missed prospect. `actEvenIfUnsure` = true means bias toward action even at lower confidence.
`contactCounterfactual`	object \| null	From waterfall: drops the highest-weight confidence component and recomputes — tells you whether the email recommendation is load-bearing on a single signal or diversified.
`contactDecisionMemory`	object \| null	From waterfall: closes the feedback loop when `lastAction` was passed in. `{ outcome: 'engaged' \| 'no-response' \| 'no-change' \| 'resolved' \| 'too-soon-to-tell', daysSinceAction, confidence }`.

Company

Field	Type	Description
`companyName`	string \| null	Company name from input or research.
`domain`	string \| null	Company domain.
`domainSource`	string \| null	`input` / `website` / `email` / `company-name-derived`.
`website`	string \| null	Company website URL.
`title`	string \| null	Job title from input.
`linkedinUrl`	string \| null	LinkedIn profile URL.
`industry` / `employeeCount` / `companyDescription`	string \| null	From company-deep-research.
`techStack`	string[]	Technologies detected on the company website.
`socialProfiles`	object \| null	LinkedIn / Twitter / Facebook / GitHub URLs.
`companyArchetype`	string \| null	From company-deep-research: `developer-platform` / `saas` / `marketplace` / `fintech` / `ecommerce` / `media` / `agency` / `enterprise-software` / `open-source-foundation` / `consumer-app` / `other`. Useful for ICP segmentation.
`companyType`	string \| null	From company-deep-research: `startup` / `scaleup` / `public` / `enterprise` / `private` / `unknown`.
`companyLifecycle`	string \| null	From company-deep-research: `nascent` / `growing` / `scaling` / `mature` / `declining` / `dormant` / `unknown`. Filter out `dormant` / `declining` to avoid wasted outreach.
`companyTrajectory`	string \| null	From company-deep-research: `accelerating` / `steady-growth` / `stable` / `decelerating` / `declining`.
`companyWhyNow`	object \| null	From company-deep-research: `{ trigger, change, importance, severity }` — what changed at this company that makes the run worth acting on. Returns null when nothing notable triggered.
`companyPriority`	object \| null	From company-deep-research's `priorities[0]` — the canonical company-level recommended action. `{ type, severity, headline, recommendedAction, evidence[], timeToImpact }`.
`companySignalIndependence`	object \| null	From company-deep-research: `{ score: 0–1, signalCount, warning? }`. Catches "3 signals or 1 echoed 3 times" at the company level.
`companyDecisionRisk`	object \| null	From company-deep-research: `{ falsePositiveCost, falseNegativeCost, reversibility, asymmetry, actEvenIfUnsure }` for the company-level priority.
`companyCounterfactual`	object \| null	From company-deep-research: drops the top company-level signal and recomputes — load-bearing-signal check.
`companyDecisionMemory`	object \| null	From company-deep-research: outcome inference when `lastAction` was passed. `{ outcome, effectivenessScore, pattern, daysSinceAction }`.
`companyEntityId`	string \| null	Company-level cross-suite join key from company-deep-research.

Scoring

Field	Type	Description
`score`	integer \| null	0-100 fit score from the lead-scoring sub-actor.
`grade`	string \| null	A-F letter grade derived from score.
`scoreBreakdown`	object \| null	Per-category scores from the sub-actor.
`scoreDecision`	string \| null	From lead-scoring-engine: `qualify` / `hold` / `disqualify` / `nurture` / `re-engage`. The score-level routing scalar.
`scoreRecommendedAction`	object \| null	From lead-scoring-engine: `{ actionId, label, owner, eta, riskTier, reason }`.
`scoreExpectedValue`	object \| null	From lead-scoring-engine: deterministic `{ dealSizeUsd, costToActUsd, expectedRevenueUsd, roi }` per lead — only when `enableEconomics=true`.
`scorePriorityRoi`	number \| null	From lead-scoring-engine: ROI-weighted priority score for ranking under budget constraints.
`scoreAllocationDecision`	object \| null	From lead-scoring-engine: when `constraints` are set — whether this lead falls inside the budget / cap envelope and why.
`scoreSimulation`	object \| null	From lead-scoring-engine: counterfactual scoring under alternate weight packs without re-running.
`scoreDataHygiene`	object \| null	From lead-scoring-engine: per-record hygiene flags (duplicates, role-address, missing fields).
`scoreSalesTrust`	object \| null	From lead-scoring-engine: trust-tier classification (`tier-1` / `tier-2` / `risk` / `block`).
`scoreTemporalSignals`	object \| null	From lead-scoring-engine: cross-run trend / momentum / re-engage flag — only when `monitorStateKey` is set.
`scoreSignalIndependence`	object \| null	From lead-scoring-engine: `{ score, distinctSourceCount, totalComponentCount, interpretation, warning? }`. Aligned with the rest of the suite.
`scoreCounterfactual`	object \| null	From lead-scoring-engine: drops the highest-weight ICP factor and recomputes — tells you whether the lead's grade is load-bearing on a single factor.
`scoreDecisionMemory`	object \| null	From lead-scoring-engine: outcome inference when `lastAction` was passed. `{ outcome, daysSinceAction, confidence }`. Only ICP-score movement is observable — direct conversion / deal outcomes are not.
`scoreEntityId`	string \| null	Cross-suite join key from lead-scoring-engine — same value as the suite-level `entityId` for joining back to the standalone scorer dataset.

Decision-output (v1.0)

Field	Type	Description
`sendDecision`	object	`{ action, riskLevel, reasons[], decisionRulePath[] }`. action ∈ `SEND_NOW` / `VERIFY_FIRST` / `ENRICH_MORE` / `SKIP`.
`bounceRiskBucket`	string enum	`low` / `medium` / `high`.
`isOutreachReady`	boolean	Single-column safelist — verified email, ≥80 confidence, ≥60 score.
`isDecisionMaker`	boolean \| null	True for c-level / VP / director titles.
`seniorityLevel`	string enum	`c-level` / `vp` / `director` / `manager` / `individual-contributor` / `unknown`.
`decisionSignals`	string[]	16-token enum vocabulary (verified-email / unverified-email / senior-title / etc.).
`leadGrade`	string enum	A-F data-completeness grade (distinct from `grade`, which is fit).
`recoveryPlan`	object \| null	`{ reason, nextBestActorSlug, why }` when enrichment didn't fully succeed.
`actionPlaybook`	string[]	Ordered next steps in plain English — usable verbatim by Dify / agents.
`complianceFlags`	object	`{ isEuBased, ccpaProtected, region, requiresOptIn }`.
`techStackMatch`	object \| null	Present when `ourTechStack` is provided — `{ matched, totalRequired, matchedTech[] }`.
`isOnSuppression`	boolean	True when matched against the suppression list.
`phoneRecoveryPlan`	object \| null	Phone-specific next-best-actor pointer when phone is missing.

v1.1 + v1.2 additive fields (R1 / R2 / R3 polish)

These all default to safe values when their inputs aren't set — adding them did not break the v1 contract.

Field	Type	Description
`recordConfidence`	integer	0-100 single-number confidence — harmonic mean of (contact + company + identity + fit).
`confidenceLevel`	string enum	`high` (≥75) / `medium` (≥50) / `low` (<50).
`confidenceBreakdown`	object	Four-axis split: `{ contact, company, identity, fit }` each 0-100.
`confidenceExplanation`	string	Plain-English summary citing strongest + weakest axis.
`customScore`	integer \| null	Persona-weighted re-aggregation of the breakdown — only when `personaWeights` is set.
`personaWeightsApplied`	object \| null	The weight pack the run used (echoed for transparency).
`entityId`	string	Stable hash-based identifier from the canonical lead key.
`mergedFromCount`	integer \| null	Number of duplicate input rows that collapsed into this lead via dedup.
`stepDiagnostics`	array	`outputMode='debug'` only — per-step `{ step, actor, durationMs, outcome, reason? }`.
`changeSinceLastRun`	object \| null	Cross-run diff (only when `monitorStateKey` is set) — see Monitoring section.
`changeFlags`	string[] \| null	Stable enum tokens describing what changed since the last run.
`isIcpRoleMatch`	boolean \| null	True when title matches one of `icp.roles`.
`matchedIcpRole`	string \| null	First `icp.roles` entry that matched.
`isIcpSeniorityMatch`	boolean \| null	True when `seniorityLevel` is in `icp.seniority`.
`isIcpIndustryMatch`	boolean \| null	True when industry contains one of `icp.industries`.
`matchedIcpIndustry`	string \| null	First `icp.industries` entry that matched.
`icpMatchScore`	integer \| null	0-100 average across declared ICP axes (axes the user didn't declare are excluded).
`isFullIcpMatch`	boolean \| null	True only when every declared ICP axis matches.
`freshness`	object	V3 — `{ lastVerifiedAt, daysSinceVerification, decayScore, freshnessLevel, explanation }`.
`stalenessDowngraded`	boolean	V3 — true when freshness rules forced a positive decision down to VERIFY_FIRST.
`priorityScore`	integer	V3 — 0-100 single number combining fit + confidence + ICP match + freshness + historical performance.
`priorityBucket`	string enum	V3 — `hot` / `warm` / `cold` / `skip`.
`priorityExplanation` / `priorityFactors`	string / object	V3 — human-readable + machine-readable composition of the priority score.
`companyInsights`	object \| null	V3 — per-domain aggregate: `{ totalContactsSeen, avgScore, decisionMakerCoverage, bestContactEntityId, accountTier, sendNowCount, explanation }`.
`intentSignals`	string[]	V3 — stable enum tokens (`tech-stack-match` / `growing-company` / `decision-maker-cluster` / etc.).
`historicalPerformance`	object \| null	V3 — `{ cohortSize, similarLeadsReplyRate, similarLeadsBounceRate, similarLeadsConvertRate, matchedCohort, explanation }` when feedback loop is active.
`autoRetryPlan`	object	V3 — `{ willRetry, strategy, expectedGain, explanation }`. Advisory only — does not actually retry.
`executionPlan`	object	V4 — decision → execution bridge: `{ channel, sequenceType, sequenceLength, timingRecommendation, personalisationLevel, tone, bestSendWindow, reason, reasonCodes[] }`. Pure deterministic, LLM-free.

Pipeline metadata

Field	Type	Description
`enrichmentSteps`	string[]	Pipeline steps that processed this lead.
`crmPushed`	boolean	Whether the lead was pushed to a CRM.
`processedAt`	string	ISO timestamp.

CRM push (Step 6 — populated when `crmPush` ≠ `none`)

Field	Type	Description
`crmRecordType`	string \| null	From CRM pusher: `result` / `skipped` / `error`.
`crmStatus`	string \| null	From CRM pusher: `success` / `partial` / `error` / `dry_run` / `skipped`.
`crmSummary`	string \| null	From CRM pusher: paste-ready one-line summary of the CRM-side write outcome.
`crmDataQuality`	object \| null	From CRM pusher: per-record quality flags (missing required fields, schema mismatches, role addresses).
`crmExecutionPlan`	object \| null	From CRM pusher: structured execution plan (companies/contacts/deals to push, in dry-run mode shown without writing).
`crmChangeAnalysis`	object \| null	From CRM pusher: when `monitorStateKey` is set — `{ newAccount, returningAccount, accountImproved, accountDegraded, qualityDelta }`.
`crmFailureAnalysis`	object \| null	From CRM pusher: root-cause attribution when a CRM write fails (auth / rate-limit / schema / quality-gate / etc.).
`crmFeedbackEvents`	object \| null	From CRM pusher: inline calibration events for closed-loop accuracy tracking.
`crmExpectedVsActual`	object \| null	From CRM pusher: rate calibration (`expected push rate` vs `actual push rate`) — surfaces upstream filtering bugs.
`crmDecisionSnapshot`	object \| null	From CRM pusher: replayable audit bundle (inputsHash + rulesApplied + profileSnapshot). Reproduce a CRM-write decision deterministically.
`crmSignalIndependence`	object \| null	From CRM pusher: `{ score, distinctSourceCount, totalComponentCount, interpretation, warning? }`. Aligned with the rest of the suite.
`crmCounterfactual`	object \| null	From CRM pusher: drops the highest-weight quality signal and recomputes — tells you whether the CRM-write decision is load-bearing on a single quality dimension.
`crmDecisionMemory`	object \| null	From CRM pusher: outcome inference when `lastAction` was passed. `{ outcome, daysSinceAction, confidence }`. Only push-result delta is observable — CRM-side downstream activity (deal stage, lead conversion) is not.
`crmEntityId`	string \| null	Cross-suite join key from CRM pusher — same value as the suite-level `entityId` for joining back to the standalone CRM-pusher dataset.

How much does it cost to enrich leads?

Lead Enrichment Pipeline uses pay-per-event pricing — you pay $0.12 per lead enriched. Platform compute costs are included. All 6 pipeline steps (email discovery, phone finding, verification, company research, scoring, CRM push) are covered in that single price.

Scenario	Leads	Cost per lead	Total cost
Quick test	1	$0.12	$0.12
Small batch	10	$0.12	$1.20
Medium batch	50	$0.12	$6.00
Large batch	200	$0.12	$24.00
Enterprise	1,000	$0.12	$120.00

You can set a maximum spending limit per run to control costs. The actor stops enriching when your budget is reached and outputs all leads processed up to that point.

Compare this to subscription-priced enrichment platforms (Clay, Apollo, ZoomInfo) which are typically priced significantly higher with seat- or credit-based billing — verify each vendor's current published plans for specifics. With Lead Enrichment Pipeline, most teams spend $12-60/month with no subscription commitment, paying only $0.12 per lead actually enriched.

Enrich leads using the API

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("ryanclinton/lead-enrichment-pipeline").call(run_input={
    "leads": [
        {"firstName": "Sarah", "lastName": "Chen", "companyName": "Acme Corp", "title": "CTO"},
        {"firstName": "James", "lastName": "Park", "companyName": "Beta Industries", "title": "VP Sales"},
        {"fullName": "Maria Rodriguez", "domain": "pinnacle.io"},
    ],
    "enrichEmail": True,
    "verifyEmails": True,
    "scoreLeads": True,
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    if item.get("type") == "summary":
        print(f"Pipeline complete: {item['totalEnrichedLeads']} leads, avg score {item['averageScore']}")
    else:
        print(f"{item['fullName']} | {item['email']} ({item['emailStatus']}) | Score: {item['score']}/{item['grade']}")

JavaScript

import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const run = await client.actor("ryanclinton/lead-enrichment-pipeline").call({
    leads: [
        { firstName: "Sarah", lastName: "Chen", companyName: "Acme Corp", title: "CTO" },
        { firstName: "James", lastName: "Park", companyName: "Beta Industries", title: "VP Sales" },
        { fullName: "Maria Rodriguez", domain: "pinnacle.io" },
    ],
    enrichEmail: true,
    verifyEmails: true,
    scoreLeads: true,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
    if (item.type === "summary") {
        console.log(`Pipeline complete: ${item.totalEnrichedLeads} leads, avg score ${item.averageScore}`);
    } else {
        console.log(`${item.fullName} | ${item.email} (${item.emailStatus}) | Score: ${item.score}/${item.grade}`);
    }
}

cURL

# Start the enrichment run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~lead-enrichment-pipeline/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "leads": [
      {"firstName": "Sarah", "lastName": "Chen", "companyName": "Acme Corp", "title": "CTO"},
      {"fullName": "Maria Rodriguez", "domain": "pinnacle.io"}
    ],
    "enrichEmail": true,
    "verifyEmails": true,
    "scoreLeads": true
  }'

# Fetch results (replace DATASET_ID from the run response)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"

How Lead Enrichment Pipeline works

Step 1: Input normalization

The pipeline accepts leads as a JSON array or CSV file. CSV headers are auto-mapped from 40+ common variations — "First Name", "first_name", "fname", and "givenname" all map to firstName. For each lead, the normalizer extracts domains from website URLs (stripping www. and paths), derives domains from company names by removing legal suffixes (LLC, Inc, Corp, etc.) and appending .com, and parses full names into first/last components. Domain extraction handles edge cases like https://www.acmecorp.com/about correctly resolving to acmecorp.com.

Step 2: Contact discovery via waterfall enrichment

Leads missing email addresses are sent in a single batch call to the waterfall-contact-enrichment sub-actor. The waterfall cascades through up to 10 enrichment steps:

MX validation — DNS lookup confirms the domain can receive mail; domains with no MX records short-circuit to not_found
15-pattern candidate generation — produces 15 B2B email candidates from the person's name using naming conventions ranked by industry prevalence (first.last, firstlast, flast, f.last, first_last, first-last, firstl, last.first, plus 7 more)
International name transliteration — 40+ accented characters (umlauts, diacritics, cedillas, eszett) are normalised to ASCII before pattern generation, so François Müller produces clean candidates
Website contact scraping — company contact, about, and team pages are scraped for direct emails, phone numbers, social links, and team-member names
Email pattern detection — public emails at the domain are analysed to identify the company's actual naming convention with a confidence percentage
People Data Labs lookup — when configured, the person record is matched against the PDL database
Cross-referencing — website-found emails are matched against the target person across all 15 patterns; direct matches receive a 90-98% confidence boost
SMTP probing — top candidates undergo an EHLO → MAIL FROM → RCPT TO handshake without ever sending email; disconnects before DATA
Catch-all detection — a random nonexistent address is tested as a control; domains accepting it are flagged catch-all and confidence is capped so SDR teams can quarantine bounce-risk leads
Multi-signal scoring — every candidate is scored from 0-98 based on the cascade's signals; the highest-confidence email is selected as primary

The waterfall caches website scraping and pattern detection per domain, so processing 50 contacts at the same company costs the same sub-actor compute as processing 1. Leads still missing phone numbers after the waterfall get a second batch call to the phone-number-finder sub-actor. Every discovered field is tagged with its source (emailSource: "pattern-detection", phoneSource: "website-scraping") for transparency.

Step 3: Email verification

All leads with email addresses — both discovered and provided in the input — are batch-verified via the bulk-email-verifier sub-actor (the Outbound Control System). Verification runs MX record lookups to confirm the domain accepts mail, then SMTP conversation checks to validate the specific mailbox. Each email gets a valid/invalid/risky/unknown/disposable status and a confidence score from 0-100. The sub-actor also emits a decision enum (send / send-monitor / hold / verify-later / replace / suppress) per address, automation triggers, deliverability simulation, and a per-record failureAnalysis block — this pipeline currently consumes status + confidence; the richer fields are available in the verifier sub-run dataset for downstream consumers. This step runs with a 900-second timeout to handle large batches.

Step 4: Company research and lead scoring

Leads with domains are batch-enriched through the company-deep-research sub-actor, which pulls from 7+ sources (company website, Wikipedia, GitHub, SEC filings, academic databases, DNS records, social profiles) to populate industry, employee count, tech stack, company description, and social profile URLs. Then the lead-scoring-engine sub-actor scores all leads with sufficient data on a 0-100 scale across 5 categories: digital presence, engagement signals, company fit, contact data completeness, and authority level. Scores are converted to A-F letter grades.

Step 5: Output and optional CRM push

Enriched leads are pushed to the dataset one at a time, with PPE charging ($0.12) applied after each push. If a spending limit is reached mid-batch, the pipeline stops and outputs only the leads processed so far. When CRM push is enabled, all enriched leads are sent in a single batch to the HubSpot or Salesforce lead pusher sub-actor before output. A summary record is appended at the end with aggregate statistics including emails found, verification counts, average score, and a download URL for the CSV file stored in the Key-Value Store.

Tips for best results

Provide the most data you have. Leads with name + company + domain enrich faster and more accurately than leads with only a name. The more input fields you provide, the fewer enrichment steps the pipeline needs to run.
Use CSV for batches over 20 leads. Upload your spreadsheet to Google Sheets, File > Share > Publish to web > CSV format, and paste the URL. The auto-mapper handles messy headers without manual column mapping.
Start with a 5-lead test run. Set maxLeads: 5 on your first run to verify the output matches your expectations before processing hundreds of leads at $0.12 each.
Disable steps you do not need. If you already have verified emails and only need company data, disable enrichEmail and verifyEmails to reduce processing time. You still pay $0.12/lead, but runs complete faster.
Enable company enrichment for B2B sales. The enrichCompany flag adds industry, employee count, and tech stack data that feeds into more accurate lead scores. Worth the extra processing time for account-based selling.
Combine with Google Maps for local leads. Run Google Maps Email Extractor first to build a local business list, then pipe those leads through this pipeline for verification, company research, and scoring.
Schedule weekly enrichment runs. Use Apify's scheduling to re-enrich your lead database weekly. New runs will re-verify emails (catching addresses that have gone stale) and update company data.
Download the CSV for CRM import. Even without the direct HubSpot/Salesforce push, the auto-generated CSV is formatted for direct import into any CRM that accepts CSV uploads.

Combine with other Apify actors

Actor	How to combine
Google Maps Email Extractor	Extract local business leads from Google Maps, then enrich with verification, company data, and scoring
Website Contact Scraper	Scrape contact pages from a list of websites, then run discovered contacts through the enrichment pipeline
Email Pattern Finder	Detect company email patterns first, then use this pipeline to verify and score the generated addresses
Bulk Email Verifier	Outbound Control System — verification + decision engine emitting `send` / `send-monitor` / `hold` / `replace` / `suppress` routing per email. Already built into step 3 of this pipeline; use standalone for email-only verification with the full decision layer (SLA tier, automation triggers, deliverability simulation, watchlist + delta tracking)
Company Deep Research	Already built into step 4 of this pipeline; use standalone for company research without the full lead workflow
HubSpot Lead Pusher	Built into step 6; use standalone to push pre-enriched leads from other sources into HubSpot
B2B Lead Gen Suite	Use Lead Gen Suite for URL-based lead extraction, then pipe results through this pipeline for deeper enrichment
AI Outreach Personalizer	After enrichment, generate personalized cold emails for each lead using your own OpenAI/Anthropic key
Intent Signal Tracker	Score buying intent before enriching — prioritize leads at companies showing hiring, funding, and tech signals
Lead Data Quality Auditor	Audit enriched output quality before outreach — catch bad emails, stale domains, and incomplete records

Templates

Pick a template and the pipeline pre-configures goal + icpRoles + ourTechStack + outputFilter for a common workflow. Explicit user fields still win where set.

Template	Ideal user	What it enables	Output behaviour
`b2b-saas-prospecting`	SDR sending cold outreach to engineering decision-makers	`goal: high-deliverability`, `icpRoles` set to CTO / VP Engineering / Director Engineering / Founder / Co-Founder / CEO	`outputFilter: send-now-only` — only verified-deliverable, scored leads on senior titles.
`enterprise-sales`	Account-based sales targeting Fortune 5000 executives	`goal: max-coverage`, `icpRoles: ['Chief', 'VP', 'Director', 'Head of']`	`outputFilter: a-b-grade-only` — full enrichment but only data-complete leads.
`recruiting-tech`	Technical recruiter sourcing engineers and engineering leaders	`goal: high-deliverability`, `icpRoles: ['Software Engineer', 'Senior Engineer', 'Staff Engineer', 'Principal Engineer', 'Engineering Manager', 'Tech Lead']`	`outputFilter: verified-emails-only` — only contactable engineers.
`recruiting-non-tech`	General recruiter — managers, directors, ops leaders	`goal: high-deliverability`, `icpRoles: ['Manager', 'Senior Manager', 'Lead', 'Director']`	`outputFilter: verified-emails-only`.
`event-leads`	Conference / trade-show follow-up on a name+company badge-scan list	`goal: quick-outreach` (email + verify only — no scoring, no company research)	`outputFilter: none` — push everything. Speed-optimised.
`agency-outbound`	Agency cold-pitching marketing decision-makers	`goal: high-deliverability`, `icpRoles: ['Founder', 'CEO', 'CMO', 'VP Marketing', 'Head of Marketing', 'Director Marketing']`	`outputFilter: send-now-only`.
`crm-cleanup`	CRM data refresh for an existing list	`goal: max-coverage` (every step)	`outputFilter: none` — push everything for direct CRM import.
`custom`	Power user wanting full control	Uses your explicit `goal` + `icpRoles` + `ourTechStack` + `outputFilter`	Whatever you set.

Templates only set defaults — you can still pass icp / outputMode / personaWeights / monitorStateKey etc. on top.

The summary record's recommendedNextRunTemplate field suggests which template to pick for a follow-up run based on the current run's outcome distribution (low deliverability → recruiting-tech filter; high ENRICH_MORE rate → crm-cleanup; etc.). Pure deterministic mapping, no LLM.

Monitoring & change detection

Set monitorStateKey to make the actor remember every lead it has seen. Subsequent scheduled runs diff each lead against the previous snapshot and emit changeSinceLastRun + a stable changeFlags[] enum on every record.

{
    "leads": [...],
    "monitorStateKey": "crm-weekly-refresh",
    "template": "crm-cleanup",
    "outputFilter": "none"
}

The state key is a NAMED Apify Key-Value Store — pick a stable name per workflow (crm-weekly-refresh, enterprise-quarterly, partner-monthly) so subsequent runs land on the same snapshot bucket. State is bounded at 50,000 lead snapshots (FIFO) so it never grows unbounded.

What you get on lead records (run #2 and later)

{
    "fullName": "Sarah Chen",
    "email": "sarah.chen@acmecorp.com",
    "title": "VP Engineering",
    "score": 84,
    "grade": "B",
    "sendDecision": { "action": "SEND_NOW", "...": "..." },
    "changeSinceLastRun": {
        "isFirstRunForLead": false,
        "previousEmail": "schen@acmecorp.com",
        "previousScore": 72,
        "previousGrade": "C",
        "previousSendDecisionAction": "VERIFY_FIRST",
        "previousTitle": "Director Engineering",
        "previousCompany": "Acme Corp",
        "daysSinceLastSeen": 7,
        "changeFlags": ["EMAIL_CHANGED", "SCORE_INCREASED", "GRADE_UPGRADED", "TITLE_CHANGED", "SEND_DECISION_UPGRADED"]
    },
    "changeFlags": ["EMAIL_CHANGED", "SCORE_INCREASED", "GRADE_UPGRADED", "TITLE_CHANGED", "SEND_DECISION_UPGRADED"]
}

Stable changeFlags[] enum

Flag	When it fires
`NEW_LEAD`	This canonical key wasn't in the previous snapshot — a new lead.
`EMAIL_CHANGED` / `EMAIL_GAINED` / `EMAIL_LOST`	Email field movement.
`EMAIL_VERIFICATION_GAINED` / `EMAIL_VERIFICATION_LOST`	Verification status flip.
`SCORE_INCREASED` / `SCORE_DECREASED`	Score moved by ≥5 points.
`GRADE_UPGRADED` / `GRADE_DOWNGRADED`	Letter grade band changed.
`TITLE_CHANGED`	Job title differs from the snapshot — a promotion or job change.
`COMPANY_CHANGED`	The contact's company name changed.
`EMPLOYEE_COUNT_CHANGED`	Company size band changed.
`SEND_DECISION_UPGRADED` / `SEND_DECISION_DOWNGRADED`	Decision moved up or down the SKIP→ENRICH_MORE→VERIFY_FIRST→SEND_NOW ladder.
`UNCHANGED`	Lead matched the prior snapshot exactly — no signal worth alerting on.

How to use change detection

Weekly CRM refresh — schedule the actor with monitorStateKey and filter downstream on changeFlags to push only what changed.
Job-change monitor — run a list of "champions" weekly and alert on TITLE_CHANGED or COMPANY_CHANGED — they may have moved to a target account.
Deliverability watchdog — alert on EMAIL_VERIFICATION_LOST to catch contacts whose mailbox went stale before your next campaign.
Send-decision movement — alert on SEND_DECISION_DOWNGRADED so SDRs stop sending to leads that newly look risky.

changeFlags: ["UNCHANGED"] is the noise floor — Dify / Zapier should filter for any other flag.

Stable enums (quick reference)

The enums below are stable within a major version. New values may be added; existing values will not be renamed or repurposed. Branch on these in automation; never parse prose fields.

Field	Values
`recordType`	`lead` / `summary` / `preflight` / `alert` / `error`
`sendDecision.action`	`SEND_NOW` / `VERIFY_FIRST` / `ENRICH_MORE` / `SKIP`
`sendDecision.riskLevel`	`low` / `medium` / `high`
`bounceRiskBucket`	`low` / `medium` / `high`
`leadGrade`	`A` / `B` / `C` / `D` / `F`
`confidenceLevel`	`high` / `medium` / `low`
`seniorityLevel`	`c-level` / `vp` / `director` / `manager` / `individual-contributor` / `unknown`
`complianceFlags.region`	`eu` / `us-ca` / `us-other` / `uk` / `apac` / `other` / `unknown`
`decisionSignals[]`	16 tokens — verified-email / unverified-email / invalid-email / no-email / high-confidence / medium-confidence / low-confidence / company-data-complete / company-data-thin / senior-title / junior-title / title-unknown / high-score / medium-score / low-score / on-suppression / eu-jurisdiction / no-domain
`changeFlags[]`	NEW_LEAD / EMAIL_CHANGED / EMAIL_GAINED / EMAIL_LOST / EMAIL_VERIFICATION_GAINED / EMAIL_VERIFICATION_LOST / SCORE_INCREASED / SCORE_DECREASED / GRADE_UPGRADED / GRADE_DOWNGRADED / TITLE_CHANGED / COMPANY_CHANGED / EMPLOYEE_COUNT_CHANGED / SEND_DECISION_UPGRADED / SEND_DECISION_DOWNGRADED / UNCHANGED
`template` (input)	`custom` / `b2b-saas-prospecting` / `enterprise-sales` / `recruiting-tech` / `recruiting-non-tech` / `event-leads` / `agency-outbound` / `crm-cleanup`
`outputMode` (input)	`crm` / `analytics` / `debug`
`outputFilter` (input)	`none` / `send-now-only` / `verified-emails-only` / `a-b-grade-only`
`priorityBucket` (V3)	`hot` / `warm` / `cold` / `skip`
`freshnessLevel` (V3)	`fresh` / `aging` / `stale` / `unknown`
`intentSignals[]` (V3)	tech-stack-match / tech-stack-strong-match / growing-company / shrinking-company / hiring-engineering / hiring-leadership / decision-maker-cluster / sole-decision-maker / fresh-data / stale-data / verified-deliverability / gdpr-protected / high-cohort-reply-rate / high-cohort-bounce-rate
`autoRetryPlan.strategy` (V3)	`verify-only` / `pattern-and-verify` / `company-research-only` / `full-enrichment` / `none`
`companyInsights.accountTier` (V3)	`high-value` / `standard` / `low-fit`
`historicalPerformance.matchedCohort` (V3)	`domain` / `industry-seniority` / `industry` / `seniority` / `none`
`listAnalytics.listHealth.issues[]` (V3)	low-deliverability / high-catch-all-rate / low-email-coverage / low-phone-coverage / low-company-coverage / few-decision-makers / high-staleness-rate / low-record-confidence / low-send-now-rate / no-leads-processed
`executionPlan.channel` (V4)	`email` / `email-then-phone` / `phone-first` / `social-only` / `do-not-contact`
`executionPlan.sequenceType` (V4)	`minimal` / `short` / `long` / `high-touch`
`executionPlan.timingRecommendation` (V4)	`immediate` / `wait-for-business-hours` / `batch-with-others` / `verify-then-send` / `quarantine`
`executionPlan.personalisationLevel` (V4)	`low` / `medium` / `high`
`executionPlan.tone` (V4)	`casual` / `professional` / `formal` / `technical`
`executionPlan.bestSendWindow` (V4)	`us-business-hours` / `eu-business-hours` / `uk-business-hours` / `apac-business-hours` / `any-business-day` / `avoid-monday-friday`

Debug-mode output example

When outputMode: "debug" is set, every lead carries a stepDiagnostics[] array with per-step timing and outcome. Useful for "why is this enrichment slow / why did this step fail?" investigations.

{
    "recordType": "lead",
    "fullName": "Sarah Chen",
    "email": "sarah.chen@acmecorp.com",
    "emailVerified": true,
    "emailConfidence": 92,
    "title": "CTO",
    "companyName": "Acme Corp",
    "domain": "acmecorp.com",
    "industry": "SaaS",
    "score": 84,
    "grade": "B",
    "sendDecision": {
        "action": "SEND_NOW",
        "riskLevel": "low",
        "reasons": ["Email verified", "Confidence ≥ 80", "Score 84 ≥ 60"],
        "decisionRulePath": ["verified-high-confidence-good-score"]
    },
    "isOutreachReady": true,
    "bounceRiskBucket": "low",
    "leadGrade": "A",
    "recordConfidence": 87,
    "confidenceLevel": "high",
    "confidenceBreakdown": { "contact": 92, "company": 85, "identity": 95, "fit": 84 },
    "confidenceExplanation": "high confidence (87/100) — strong identity signal (95/100)",
    "decisionSignals": ["verified-email", "high-confidence", "company-data-complete", "senior-title", "high-score"],
    "isDecisionMaker": true,
    "seniorityLevel": "c-level",
    "isIcpRoleMatch": true,
    "matchedIcpRole": "CTO",
    "isIcpSeniorityMatch": true,
    "isIcpIndustryMatch": true,
    "matchedIcpIndustry": "SaaS",
    "icpMatchScore": 100,
    "isFullIcpMatch": true,
    "complianceFlags": { "isEuBased": false, "ccpaProtected": null, "region": "us-other", "requiresOptIn": false },
    "actionPlaybook": [
        "Add to outreach sequence — verified email, ready today",
        "Prioritise — high score signals strong fit",
        "Use exec-tier messaging — decision-maker title detected"
    ],
    "recoveryPlan": null,
    "entityId": "lead_8f3a2c7d",
    "stepDiagnostics": [
        { "step": "normalize",           "actor": "orchestrator",                 "durationMs": 0,    "outcome": "success" },
        { "step": "contact-discovery",   "actor": "kIEqeHJbKtCuBbkVE",            "durationMs": 4_201, "outcome": "success" },
        { "step": "email-verification",  "actor": "Atdqy4shZ8zx8gkEi",            "durationMs": 1_872, "outcome": "success" },
        { "step": "company-research",    "actor": "2cAY2V9yz1JE2H1S2",            "durationMs": 8_410, "outcome": "success" },
        { "step": "lead-scoring",        "actor": "mZ8NsHKEBQSIcvW3W",            "durationMs": 612,   "outcome": "success" }
    ],
    "processedAt": "2026-05-03T14:22:18.401Z"
}

The summary record (one per run) carries the new R2 quality-dashboard aggregates:

{
    "recordType": "summary",
    "totalEnrichedLeads": 47,
    "leadsFiltered": 3,
    "averageScore": 71,
    "listAnalytics": {
        "deliverabilityRate": 87.2,
        "validEmailRate": 87.2,
        "catchAllRate": 4.3,
        "averageRecordConfidence": 79,
        "decisionMakerRate": 38.3,
        "sendNowCount": 31,
        "verifyFirstCount": 9,
        "skipCount": 4,
        "enrichmentCoverage": {
            "email": 95.7, "phone": 23.4, "companyDescription": 72.3,
            "industry": 68.1, "employeeCount": 65.9, "techStack": 51.0,
            "linkedinUrl": 42.5, "title": 89.4
        },
        "topFailureReasons": [
            { "reason": "TIMEOUT", "count": 1 }
        ],
        "recommendedNextRunTemplate": "b2b-saas-prospecting",
        "listQualityGrade": "A"
    },
    "ppeChargesUsd": 5.64,
    "circuitBreakerTripped": false
}

Confidence system

Every enriched lead carries a single recordConfidence (0-100) so downstream automation can filter on one number. It collapses four independent axes via harmonic mean — every axis must be reasonably healthy for the score to be high (one strong axis can't mask a weak one):

Axis	What it measures	Inputs
contact	Email deliverability + verification	`emailConfidence` + verification status
company	Completeness of company-research fields	industry, employees, description, tech stack
identity	Strength of identity signals	name, domain, title, LinkedIn
fit	Sub-actor's lead-scoring 0-100	echoed from `score`

A confidenceLevel band (high ≥75 / medium ≥50 / low <50) and a plain-English confidenceExplanation ship alongside, so a Slack/CRM/agent flow can read "high confidence (87/100) — strong identity signal" verbatim.

When personaWeights is set (e.g. { contact: 0.4, company: 0.2, identity: 0.2, fit: 0.2 }), a customScore is emitted alongside — the four axes re-weighted to your buyer's preferences. The sub-actor's score field stays untouched.

Priority engine (who to email first)

Lead prioritisation ranks contacts based on fit, confidence, and freshness to determine who to contact first. Lead prioritisation usually requires a CRM scoring model — this replaces it with a built-in priority engine. This directly answers the question "who should I contact first?" without requiring Salesforce, HubSpot scoring, or manual triage. Every lead carries a priorityScore (0-100) plus a priorityBucket (hot / warm / cold / skip). Sort on priorityScore for outreach order; branch automation on priorityBucket.

The score combines five signals via a transparent linear formula:

Signal	Default weight	Source
Fit	35% (50% when no ICP defined)	`score` field from `lead-scoring-engine`
Confidence	25% (40% when no ICP defined)	`recordConfidence` from this actor
ICP match	25%	`icpMatchScore` from structured `icp` input
Freshness penalty	-15 to 0	Stale data drops priority
Historical lift	-10 to +10	`historicalPerformance` from feedback loop

priorityFactors exposes each component's contribution so the score is fully auditable. priorityExplanation is a plain-English one-liner ("HOT (87/100) — full ICP match, outreach-ready, strong fit signal").

SKIP decisions force priorityScore: 0 and priorityBucket: 'skip' regardless of other signals — the priority engine never recommends contacting a SKIP'd lead.

Freshness & data decay

Avoiding stale leads requires detecting outdated data and downgrading or re-verifying contacts before outreach. This prevents one of the most common outreach failures: emailing stale or invalid leads that damage sender reputation. Most tools rely on manual list cleaning — this automatically detects stale leads and stops you sending to them. Every lead carries a freshness block with:

lastVerifiedAt — ISO timestamp of the most recent verified state
daysSinceVerification — integer day count
decayScore (0-100) — 0 on the day of verification, 50 at 30 days, 100 at 90+ days
freshnessLevel — fresh (≤33 decay) / aging (≤66) / stale (>66) / unknown (no prior verification)

When freshnessLevel is stale (or aging + email isn't currently verified), the decision engine forces SEND_NOW down to VERIFY_FIRST. The downgrade is flagged on the lead via stalenessDowngraded: true and counted in the summary's listAnalytics.stalenessDowngrades.

Freshness only kicks in when monitorStateKey is set — the actor needs a prior snapshot to know when verification last happened. Without monitoring, freshness reports unknown.

Closed-loop feedback (outcomes → historicalPerformance)

Ship outcomes from your CRM / email tool back into the actor and it remembers them across runs. Subsequent leads in the same cohort get a historicalPerformance block.

{
    "leads": [...],
    "feedbackStateKey": "outbound-q2",
    "feedback": {
        "type": "outcome",
        "data": [
            {"entityId": "lead_8f3a2c", "outcome": "replied",   "domain": "acmecorp.com",       "industry": "SaaS", "seniorityLevel": "c-level"},
            {"entityId": "lead_7a1b4d", "outcome": "bounced",   "domain": "betaindustries.com"},
            {"entityId": "lead_2c5e9f", "outcome": "converted", "domain": "pinnacle.io",        "industry": "SaaS", "seniorityLevel": "vp"}
        ]
    }
}

historicalPerformance finds the tightest cohort match (with at least 3 outcomes):

Same domain — strongest signal: does this company reply?
Same industry + seniority — for cohort-level patterns
Same industry — broader fallback
Same seniority — broadest

"historicalPerformance": {
    "cohortSize": 12,
    "similarLeadsReplyRate":   0.33,
    "similarLeadsBounceRate":  0.08,
    "similarLeadsConvertRate": 0.17,
    "matchedCohort": "industry-seniority",
    "explanation": "12 prior outcomes from same industry + seniority cohort — 33% reply, 8% bounce, 17% convert"
}

When no cohort meets the minimum sample size, the field is null — better silence than fabricated confidence. Outcomes feed priorityScore (high reply rate boosts; high bounce rate drops) and intentSignals (high-cohort-reply-rate / high-cohort-bounce-rate). The actor surfaces patterns; it does NOT auto-mutate scoring weights — that's a trust-killer per polish-ux Section AW. Tune personaWeights manually based on what historicalPerformance shows.

Company-level intelligence

When ≥2 leads share a domain, every lead in the group carries a companyInsights block with the per-company aggregate:

"companyInsights": {
    "domain": "acmecorp.com",
    "totalContactsSeen": 4,
    "avgScore": 76,
    "decisionMakerCoverage": 2,
    "bestContactEntityId": "lead_8f3a2c",
    "accountTier": "high-value",
    "averageRecordConfidence": 84,
    "sendNowCount": 3,
    "explanation": "4 contacts on this domain — 2 decision-makers — avg fit score 76 — 3 ready to send — account tier: high-value"
}

Account-tier rules are deterministic:

high-value — ≥2 decision-makers AND avgScore ≥ 70
low-fit — 0 decision-makers AND avgScore < 40
standard — everything else

This moves the actor from per-lead enrichment into account-based-selling territory — your downstream flow can branch on companyInsights.accountTier to pick a sequence cadence per account, or look up bestContactEntityId to email the highest-priority contact at each company.

Intent signals (LLM-free)

Stable enum array on every lead derived from existing fields + monitor state. No LLM, no external API — pure regex/pattern matching with deterministic firing conditions you can audit.

Signal	Fires when
`tech-stack-match` / `tech-stack-strong-match`	`ourTechStack` overlap detected (≥1 / ≥3 matches)
`growing-company` / `shrinking-company`	Employee count changed since last snapshot
`hiring-engineering`	Engineering title at a 11-500-person company
`hiring-leadership`	New senior contact appears since last run
`decision-maker-cluster`	≥2 senior contacts on the same domain
`sole-decision-maker`	1 contact, but they're senior
`fresh-data`	Verified within last 7 days
`stale-data`	Verified ≥60 days ago
`verified-deliverability`	`isOutreachReady` AND `bounceRiskBucket: 'low'`
`gdpr-protected`	EU/UK jurisdiction
`high-cohort-reply-rate`	Feedback loop shows ≥30% reply rate in cohort
`high-cohort-bounce-rate`	Feedback loop shows ≥20% bounce rate in cohort

Scenario simulation (what-if without re-running)

The summary record carries a scenarioSimulation block — counterfactuals over the leads we just enriched, showing what each setting WOULD have produced. Pure post-processing, zero new sub-actor calls, no PPE charge.

"scenarioSimulation": {
    "actual":                       { "sendNowCount": 31, "verifyFirstCount": 9, "skipCount": 4, "outreachReadyCount": 31, "description": "actual run" },
    "ifMinEmailConfidence80":       { "sendNowCount": 22, "verifyFirstCount": 18, "skipCount": 4, "outreachReadyCount": 22, "description": "minEmailConfidence: 80" },
    "ifStrictModeTrue":             { "sendNowCount": 31, "verifyFirstCount": 0,  "skipCount": 13, "outreachReadyCount": 31, "description": "strictMode: true" },
    "ifOutputFilterSendNowOnly":    { "sendNowCount": 31, "verifyFirstCount": 0,  "skipCount": 0,  "outreachReadyCount": 31, "description": "outputFilter: send-now-only" },
    "ifOutputFilterAOrBGradeOnly":  { "sendNowCount": 24, "verifyFirstCount": 6,  "skipCount": 2,  "outreachReadyCount": 24, "description": "outputFilter: a-b-grade-only" },
    "ifAllStrict":                  { "sendNowCount": 22, "verifyFirstCount": 0,  "skipCount": 0,  "outreachReadyCount": 22, "description": "all strict toggles + send-now filter" }
}

Use this to tune the next run's input without paying for trial-and-error.

List health

The summary record's listAnalytics.listHealth block collapses overall list quality into one number + grade + machine-readable issue codes:

"listHealth": {
    "score": 78,
    "grade": "B",
    "issues": ["high-catch-all-rate", "low-phone-coverage"],
    "explanation": "List health B (78/100) — 2 issues detected: high-catch-all-rate, low-phone-coverage"
}

Stable issue-code enum: low-deliverability / high-catch-all-rate / low-email-coverage / low-phone-coverage / low-company-coverage / few-decision-makers / high-staleness-rate / low-record-confidence / low-send-now-rate.

Branch dashboards on listHealth.grade for traffic-light reporting; branch automation on individual issues codes.

Execution layer (decision → execution bridge)

Most tools tell you who to contact. This tells you how to contact them. Eliminates guesswork in sequencing, timing, and messaging strategy.

Execution planning defines how to contact a lead, including channel, timing, and sequence strategy. executionPlan answers the question every other enrichment tool leaves to the user: how should this message be sent. Pure deterministic mapping from existing fields, no LLM, no ML — every output trace-able to a rule you can read.

"executionPlan": {
    "channel": "email-then-phone",
    "sequenceType": "high-touch",
    "sequenceLength": 7,
    "timingRecommendation": "wait-for-business-hours",
    "personalisationLevel": "high",
    "tone": "formal",
    "bestSendWindow": "avoid-monday-friday",
    "reason": "high-touch sequence, 7 touches — email-then-phone via avoid-monday-friday — high personalisation — formal tone — timing: wait-for-business-hours",
    "reasonCodes": ["multichannel-decision-maker", "strict-region-business-hours", "exec-tier-high-touch", "high-personalisation-high-value", "formal-tone-c-level", "avoid-mon-fri-senior-target"]
}

Channel selection rules

Channel	Fires when
`email`	Default.
`email-then-phone`	Phone available + decision-maker + outreach-ready — multichannel sequence.
`phone-first`	Verified phone + c-level/vp + low bounce risk — direct dial cuts cycle time.
`social-only`	No deliverable email but LinkedIn URL exists — social fallback.
`do-not-contact`	SKIP decision.

Sequence type rules

Type	Length	Fires when
`high-touch`	7 touches	Senior target (c-level/vp) + (full ICP match OR high-value account) + outreach-ready.
`long`	5 touches	Outreach-ready + has company description + (decision-maker OR full ICP match).
`short`	3 touches	Outreach-ready, standard cohort.
`minimal`	1 touch	Cold lead or low confidence — single exploratory touch.
`minimal`	0 touches	SKIP decision.

Timing rules

Recommendation	Fires when
`immediate`	Outreach-ready + fresh data — fire today.
`wait-for-business-hours`	EU/UK lead — strict region respects send windows.
`verify-then-send`	VERIFY_FIRST decision OR stale data needs re-verification.
`batch-with-others`	Default for warm leads not yet ready for instant send.
`quarantine`	SKIP decision.

Personalisation rules

Level	Fires when
`high`	High-value account OR (full ICP match + company description + tech stack).
`medium`	Has company description AND `recordConfidence` ≥ 70.
`low`	Default — batch-safe template tier.

Tone rules (industry + seniority pattern match)

Tone	Trigger
`formal`	Legal / finance / banking / fintech / insurance / healthcare / pharma / government / regulated. Or c-level seniority.
`technical`	SaaS / software / cybersecurity / cloud / devops / AI / data — engineering audience.
`casual`	Agency / marketing / creative / design / media / retail / hospitality.
`professional`	Default — standard B2B.

Send window rules

Window	Fires when
`avoid-monday-friday`	Senior target (c-level/vp) — Tue–Thu sweet spot beats Mon/Fri.
`eu-business-hours`	EU region — 9am–11am CET.
`uk-business-hours`	UK region — 9am–11am BST/GMT.
`apac-business-hours`	APAC — local 10am–2pm.
`us-business-hours`	US (CA or other) — 9am–12pm local.
`any-business-day`	Region unknown.

Why this matters

Most outreach platforms charge per-seat for sequencing logic that's identical across leads. Your downstream tool (Outreach.io / Salesloft / Lemlist / Smartlead / Apollo Engage / your own automation) can:

Branch on executionPlan.sequenceType to pick the right cadence template
Branch on executionPlan.bestSendWindow to schedule around region rules
Branch on executionPlan.tone to pick a copy variant per industry+seniority
Branch on executionPlan.channel to route to email vs phone vs social
Read executionPlan.reasonCodes for audit logging

The actor doesn't write the message — it tells your tool how to send it. That's the decision-to-execution bridge.

Cohesion across the suite

This actor and its 7 sub-actors emit a shared decision-engine vocabulary so an upstream orchestrator (Dify / n8n / Make / a custom agent) sees one consistent contract whether it queries the suite directly or queries a single sub-actor. The shared vocabulary covers four axes:

Shared input field — `monitorStateKey`

Every actor in the suite accepts monitorStateKey as a suite-aligned alias for its native cross-run-state input (watchlistName on most sub-actors, portfolioId on company-deep-research). Pass one consistent value across the suite and every sub-actor lights up cross-run state on the same logical workflow:

{
  "monitorStateKey": "abm-q4-watchlist",
  "leads": [...]
}

Actor	Native input	`monitorStateKey` aliases
lead-enrichment-pipeline (this)	`monitorStateKey`	—
waterfall-contact-enrichment	`watchlistName`	yes
phone-number-finder	`watchlistName`	yes
bulk-email-verifier	`watchlistName`	yes
company-deep-research	`portfolioId`	yes
lead-scoring-engine	`watchlistName`	yes
hubspot-lead-pusher	`watchlistName`	yes
salesforce-lead-pusher	`watchlistName`	yes

Shared output field — `entityId`

Every record across the suite carries an entityId (the cross-suite canonical join key, sha256-derived). Each sub-actor emits both entityId (suite-aligned) and eventId (legacy alias — same value). Suite-level records carry an entityId plus *EntityId fields per sub-actor (contactEntityId, phoneEntityId, emailEntityId, companyEntityId, scoreEntityId, crmEntityId) so you can join back to any standalone sub-actor's dataset for fuller diagnostics.

Shared decision-engine fields

Every sub-actor surfaces a per-record decision layer with consistent semantics — even though the underlying domain differs (contact discovery vs phone vs email vs company vs score vs CRM-write). The suite passes these through under namespaced fields:

Concept	Suite field prefix	Per-sub-actor name
Routing scalar	`phoneDecision`, `emailDecision`, `scoreDecision`, `crmStatus`	`decision` / `status`
Recommended action	`emailRecommendedAction`, `scoreRecommendedAction`	`recommendedAction`
Failure attribution	`emailFailureAnalysis`, `crmFailureAnalysis`	`failureAnalysis`
Cross-run delta	`emailDelta`, `crmChangeAnalysis`, `scoreTemporalSignals`	`delta` / `changeAnalysis` / `temporalSignals`
Audit replay	`emailDecisionSnapshot`, `crmDecisionSnapshot`	`decisionSnapshot`

Shared metacognition fields (where applicable)

waterfall-contact-enrichment and company-deep-research emit a shared metacognition layer that the suite passes through:

Concept	Suite field	Sub-actor field
Signal-stacking warning	`contactSignalIndependence`, `companySignalIndependence`	`signalIndependence`
FP / FN cost asymmetry + actEvenIfUnsure	`contactDecisionRisk`, `companyDecisionRisk`	`decisionRisk`
Load-bearing-signal check	`contactCounterfactual`, `companyCounterfactual`	`counterfactual`
Outcome inference (closed-loop)	`contactDecisionMemory`, `companyDecisionMemory`	`decisionMemory`

Closed-loop feedback shape: pass lastAction: { type, takenAt, note? } to the standalone sub-actor and on subsequent runs it emits decisionMemory with an inferred outcome (engaged / no-response / no-change / resolved / too-soon-to-tell). Outcome is inferred from observable signal change only — direct replies / off-platform engagement are not visible.

The suite also accepts its own feedback + feedbackStateKey inputs at the lead level for cohort-based outcome aggregation — see Closed-loop feedback.

Failure scenarios & recovery

The pipeline is non-blocking — when a sub-actor fails or returns nothing, the step is recorded in failedSteps[] and the run continues with partial data. Each lead carries a recoveryPlan pointing at the right next-best Apify actor:

Failure	What we surface	recoveryPlan.nextBestActorSlug
No email found, no domain	`decisionSignals: ["no-email", "no-domain"]`	`ryanclinton/website-contact-scraper`
No email found, domain known	`recoveryPlan.reason: "Email discovery returned nothing"`	`ryanclinton/email-pattern-finder`
Email returned invalid	`emailVerified: false`, `bounceRiskBucket: "high"`	`ryanclinton/waterfall-contact-enrichment`
Email unverified	`decisionSignals: ["unverified-email"]`	`ryanclinton/bulk-email-verifier`
No phone, high-seniority lead	`phoneRecoveryPlan` with reason	`ryanclinton/phone-number-finder`
Score < 40, no company context	`recoveryPlan.reason: "Score is low, company context missing"`	`ryanclinton/company-deep-research`

Each lead's actionPlaybook[] array is the plain-English version of the same logic — paste straight into a Slack/email/task body. No LLM rewriting required.

When 3 consecutive sub-actor calls fail, the orchestrator's circuit breaker trips and the run exits cleanly with an alert record (recordType: "alert", alertType: "pipeline-degraded"). Leads enriched before the trip are pushed; downstream sub-actor calls are aborted to avoid runaway billing.

If outputMode: "debug" is set, every lead also carries a stepDiagnostics[] array with per-step timing and outcome — useful when "why is enrichment slow on this batch?" comes up.

When NOT to use this actor

Honest scope-fence — these are jobs the pipeline is not the right tool for, with the better one named:

Need	Use this instead
Discover new leads from search/firmographics (no input list)	B2B Lead Gen Suite — generates leads from queries; this pipeline enriches an existing list
Single-domain company research (no contacts)	Company Deep Research — same engine, standalone
Pattern-only email synthesis (no verification, no scoring)	Email Pattern Finder — pattern detection + send-decision in one tool
Bulk verify a list of emails you already have	Bulk Email Verifier — skip the discovery + scoring stages
Local business leads from Google Maps	Google Maps Lead Enricher — Maps-first orchestrator
Real-time hiring/funding intent signals	Intent Signal Tracker — buyer-stage intelligence
Single-person enrichment via PDL	Person Enrichment Lookup — direct PDL wrapper
Generate cold-email copy for enriched leads	AI Outreach Personalizer — runs after this pipeline
Build a zero-shot agent that picks the right enrichment tool	Use this actor — it returns decisions an agent can branch on directly

This pipeline is the right tool when you have an input list and want enriched, verified, scored, decision-ready records out the other end. It's not a database, not a search engine, and not a creative copywriter.

Use in Dify

Drop this actor into Dify workflows via the Apify plugin's Run Actor node. Each enriched lead returns scored, classified, and decided as structured JSON — SEND_NOW / VERIFY_FIRST / ENRICH_MORE / SKIP plus the bounce-risk band, lead grade, and ready-to-send boolean your downstream node branches on. Clay pointed at the same input returns raw enriched data; this returns send-or-skip decisions.

Actor ID: ryanclinton/lead-enrichment-pipeline
Sample input (one-click outreach prep — find missing emails, verify, score, decide):

{
  "leads": [
    {"firstName": "Sarah", "lastName": "Chen", "companyName": "Acme Corp", "website": "acmecorp.com", "title": "CTO"},
    {"email": "james@betaindustries.com", "companyName": "Beta Industries"},
    {"fullName": "Maria Rodriguez", "domain": "pinnacle.io"}
  ],
  "goal": "high-deliverability",
  "outputFilter": "send-now-only",
  "strictMode": false
}

Branching example (Dify if/else node)

IF lead.recordType == "lead" AND lead.sendDecision.action == "SEND_NOW"
  → push to outreach sequence
IF lead.sendDecision.action == "VERIFY_FIRST"
  → re-run lead.recoveryPlan.nextBestActorSlug (e.g. bulk-email-verifier)
IF lead.sendDecision.action == "ENRICH_MORE"
  → re-run lead.recoveryPlan.nextBestActorSlug (e.g. website-contact-scraper)
IF lead.sendDecision.action == "SKIP"
  → drop, log reason from lead.sendDecision.reasons
IF lead.recordType == "alert"
  → notify Slack/PagerDuty (list quality dropped or pipeline degraded)

The actionPlaybook[] array on every lead is usable verbatim — no LLM rewriting needed. It already says "Run bulk-email-verifier on this email before sending" or "Add to outreach sequence — verified email, ready today" in plain English, ready to paste into a Slack/email/task body.

Opt-in modes Dify workflows can leverage

goal: "quick-outreach" — fastest path; email + verify only. Use when speed matters more than scoring.
goal: "high-deliverability" (default) — email + verify + score; the standard cold-outreach prep.
goal: "max-coverage" — every step; use when populating a CRM with everything you can find.
outputFilter: "send-now-only" — Dify only receives ready-to-send leads, no parsing required.
outputFilter: "a-b-grade-only" — only data-complete leads (8+ enriched fields).
strictMode: true — VERIFY_FIRST upgrades to SKIP. Use when sender reputation matters more than coverage.
emitPreflight: true (default) — a recordType: "preflight" cost-estimate record arrives FIRST in the dataset, so Dify can short-circuit if the run will exceed budget.

Stable enums Dify nodes can branch on

Field	Values
`recordType`	`lead` / `summary` / `preflight` / `alert` / `error`
`sendDecision.action`	`SEND_NOW` / `VERIFY_FIRST` / `ENRICH_MORE` / `SKIP`
`sendDecision.riskLevel`	`low` / `medium` / `high`
`bounceRiskBucket`	`low` / `medium` / `high`
`leadGrade`	`A` / `B` / `C` / `D` / `F`
`seniorityLevel`	`c-level` / `vp` / `director` / `manager` / `individual-contributor` / `unknown`
`complianceFlags.region`	`eu` / `us-ca` / `us-other` / `uk` / `apac` / `other` / `unknown`
`decisionSignals[]`	16-token enum vocabulary (verified-email / unverified-email / senior-title / etc.)
`isOutreachReady` / `isDecisionMaker` / `isOnSuppression`	`true` / `false`

Decision-ready booleans (isOutreachReady, isDecisionMaker, isOnSuppression) are the single-column safelists Dify automation actually wants — no nested-object parsing required.

Limitations

No LinkedIn scraping — the pipeline does not scrape LinkedIn profiles directly. LinkedIn URLs provided in input are used for enrichment matching but not crawled. This keeps the actor compliant with LinkedIn's terms of service.
Email discovery depends on public data — waterfall enrichment works best for leads at companies with public websites. Stealth-mode startups with no web presence may return null emails.
Company research requires a valid domain — if no domain can be extracted from the website, email, or company name, the company enrichment step is skipped for that lead.
Phone discovery is US-focused — phone number finding works best for US-based businesses and professionals. International phone discovery has lower success rates.
CSV parser handles standard CSV only — the built-in CSV parser supports quoted fields and common delimiters but does not handle Excel files (.xlsx). Export to CSV first.
Processing time scales with enabled steps — a full enrichment run (all 6 steps) on 200 leads may take 10-15 minutes. Disable unneeded steps to reduce time.
Sub-actor failures are non-blocking — if a sub-actor times out or fails, that step is skipped and the pipeline continues. This means some leads may have partial enrichment.
CRM push requires API credentials — HubSpot push needs a private app access token; Salesforce push needs instance URL and access token. The actor does not store credentials between runs.

Integrations

Zapier — trigger enrichment runs when new leads arrive in Google Sheets, Airtable, or CRM
Make — build multi-step workflows that feed webform submissions into the enrichment pipeline
Google Sheets — export enriched leads directly to a Google Sheet for team collaboration
Apify API — trigger enrichment from any backend system via REST API with Python, JavaScript, or cURL
Webhooks — get notified when enrichment completes and automatically fetch results
LangChain / LlamaIndex — feed enriched lead data into AI agents for automated outreach drafting or lead research

Troubleshooting

Empty email results for most leads — email discovery works best when leads include a company domain or website. Leads with only a name and no company information have limited enrichment options. Add company names or domains to improve discovery rates.
Run taking longer than 10 minutes — full enrichment with all 6 steps enabled processes leads sequentially through each sub-actor. Disable enrichCompany for faster runs, or reduce the batch size with maxLeads. Each sub-actor has its own timeout (up to 900 seconds for waterfall enrichment).
CSV file not loading — the csvUrl must be a publicly accessible URL that returns raw CSV text. Google Sheets share links do not work — use the "Publish to web" CSV export URL instead. The URL must respond within 30 seconds.
Some leads missing scores — the scoring engine requires at least an email or domain to generate a score. Leads where contact discovery failed and no domain was derivable will have score: null and grade: null.
CRM push showing 0 leads pushed — verify your API credentials. HubSpot requires a private app access token (not a legacy API key). Salesforce requires both instanceUrl and accessToken in the credentials object.

Responsible use

This actor only accesses publicly available contact and company information.
Respect website terms of service and robots.txt directives.
Comply with GDPR, CAN-SPAM, and other applicable data protection laws when using enriched lead data for outreach.
Do not use extracted data for spam, harassment, or unauthorized purposes.
For guidance on web scraping legality, see Apify's guide.

FAQ

How many leads can I enrich in one run? There is no hard limit on leads per run. The actor processes leads in batch and charges $0.12 per lead. For runs over 1,000 leads, increase the memory allocation to 512 MB. Use maxLeads to cap processing if you want to control costs.

How is Lead Enrichment Pipeline different from Clay? Clay is typically subscription-priced with per-credit charges on top — verify current published plans for specifics. This pipeline charges a flat $0.12 per lead with no monthly subscription, covering all enrichment steps in one price. For 500 leads/month, this pipeline costs $60 — usually a meaningful fraction of comparable subscription stacks. The code is also open for inspection on Apify.

Does lead enrichment work without an email address? Yes. The pipeline is designed for partial leads. Provide a name + company, name + domain, or even just a domain, and the waterfall enrichment will discover email addresses. Leads with more input data produce better results.

What types of emails are filtered out during verification? The email verifier checks MX records and SMTP mailbox existence. Emails at domains with no MX records are marked invalid. Catch-all domains (which accept mail to any address) are marked catch-all with lower confidence scores. Role-based addresses like info@ and support@ are flagged.

Is it legal to enrich lead data from public sources? This pipeline only a

Zoominfo Companies Scraper

pratikdani/zoominfo-companies-scraper

The ZoomInfo Company Profile Scraper is designed to extract detailed information from valid ZoomInfo company profiles. This tool automates the process of gathering essential company data, making it easier for users to analyze and utilize the information for various purposes.

Pratik Dani

241

Zoominfo Scraper

ecomscrape/zoominfo-company-scraper

Zoominfo Scraper extracts detailed company data from Zoominfo, providing structured formats like JSON for reports, spreadsheets,.. It supports input via URLs or company names, proxy setup, and retries. Output includes revenue, address, industries, and more, ideal for market research and analysis.

ecomscrape

707

1.7

ZoomInfo Profile Scraper – Cheap 💼🌐

contactminerlabs/zoominfo-profile-scraper---cheap

🔍 Scrape Mass/Bulk ZoomInfo People Enter a keyword & extract relevant ZoomInfo profiles, including username, full name, headline, company, bio & URL 📊 Perfect for lead generation, recruitment, B2B outreach, talent sourcing & enriching your data pipelines across Google Sheets & automation tools

ContactMinerLabs

Zoominfo People Search Scraper

pratikdani/zoominfo-people-search-scraper

Extracts publicly available contact and professional information from Zoominfo's people search results. Scrapes data like names, titles, companies, and locations, enabling targeted lead generation and sales intelligence.

Pratik Dani

271

1.0

Signalbase - Funding & Business Signals API

signalbase/signalbase-api

Real-time GTM signal infrastructure giving you minute-old funding rounds, acquisitions, hiring, and executive job changes. Source-attributed and built for data vendors, GTM teams, and AI agents.

Signalbase

111

5.0

B2B Leads Finder — Prospects by Title, Industry & Location API

nexgendata/b2b-leads-finder

Describe your ideal buyer by job title, industry & location and get net-new B2B leads: name, title, seniority, company, domain, LinkedIn URL, and a candidate business email with a deliverability status you can verify. Finds new contacts, not a URL scraper. Export CSV/JSON/Excel.

NexGenData

139

Zoominfo People Profile Scraper

pratikdani/zoominfo-people-profile-scraper

This actor scrapes Zoominfo people profiles, extracting valuable contact information like email addresses, phone numbers, job titles, and company details. Perfect for lead generation and targeted outreach campaigns.

Pratik Dani

284

Signalbase Real Time Hiring Signals

signalbase/signalbase-hiring

Real-time LinkedIn hiring signals with full company context. 679k+ active job postings detected and growing every minute. Built for prospecting on growing companies, AI SDR triggers, and GTM teams.

Signalbase

Cookie & Session Manager

alizarin_refrigerator-owner/cookie-manager

A general-purpose cookie/session manager that captures, stores, validates & refreshes browser cookies for any website. Many scrapers need authenticated sessions to access data behind login walls. Manually exporting cookies from your browser is tedious & they expire frequently. This actor solves that

The Howlers

AI Lead Enrichment Pipeline - Email, Verify, Score & Send

luckborn/ai-lead-enrichment-pipeline

Turn a name + company into a decision-ready sales lead. Multi-step waterfall: email discovery -> MX/SMTP verification -> company enrichment -> AI fit scoring -> send decision. Deterministic and auditable. 5-47x cheaper than Clay.

Luck Born

Lead Enrichment Pipeline — 5-47x Cheaper Than Clay

Lead Enrichment Pipeline

What makes this different (read this first)

TL;DR — if you just want results

What is outbound lead automation?

Why you can trust the output

Built for automation & AI agents

How this compares to the alternatives

How the pipeline works

What data can you extract?

Why use Lead Enrichment Pipeline?

Features

Use cases for lead enrichment

Sales prospecting

Marketing agency lead generation

Recruiting and talent sourcing

CRM data enrichment

Competitive intelligence

Event lead processing

How to enrich leads with this pipeline

Input parameters

Core

Pipeline configuration

Decision tuning (v1.1+)

CRM push

Input examples

Input tips

Output example

Output fields

Identity & contact

Company

Scoring

Decision-output (v1.0)

v1.1 + v1.2 additive fields (R1 / R2 / R3 polish)

Pipeline metadata

CRM push (Step 6 — populated when crmPush ≠ none)

How much does it cost to enrich leads?

Enrich leads using the API

Python

JavaScript

cURL

How Lead Enrichment Pipeline works

Step 1: Input normalization

Step 2: Contact discovery via waterfall enrichment

Step 3: Email verification

Step 4: Company research and lead scoring

Step 5: Output and optional CRM push

Tips for best results

Combine with other Apify actors

Templates

Monitoring & change detection

What you get on lead records (run #2 and later)

Stable changeFlags[] enum

How to use change detection

Stable enums (quick reference)

Debug-mode output example

Confidence system

Priority engine (who to email first)

Freshness & data decay

Closed-loop feedback (outcomes → historicalPerformance)

Company-level intelligence

Intent signals (LLM-free)

Scenario simulation (what-if without re-running)

List health

Execution layer (decision → execution bridge)

Channel selection rules

Sequence type rules

Timing rules

Personalisation rules

Tone rules (industry + seniority pattern match)

Send window rules

Why this matters

Cohesion across the suite

Shared input field — monitorStateKey

Shared output field — entityId

Shared decision-engine fields

Shared metacognition fields (where applicable)

Failure scenarios & recovery

When NOT to use this actor

Use in Dify

CRM push (Step 6 — populated when `crmPush` ≠ `none`)

Shared input field — `monitorStateKey`

Shared output field — `entityId`