Lead Enrichment Pipeline — Email, Firmographics & ICP avatar

Lead Enrichment Pipeline — Email, Firmographics & ICP

Pricing

Pay per usage

Go to Apify Store
Lead Enrichment Pipeline — Email, Firmographics & ICP

Lead Enrichment Pipeline — Email, Firmographics & ICP

Takes raw lead lists from any source and returns deduplicated, enriched, scored records. The value-add layer that sits on top of your scrapers.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Prooflio AI

Prooflio AI

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Share

Lead Enrichment Pipeline — Dedup, Email, Firmographics & ICP Scoring

Stop re-scraping saturated sources. Be the value-add layer on top of them.

This Actor takes raw lead lists from any source — the output of another scraper, a messy CSV, a domain list — and returns deduplicated, enriched, and scored records. Its input is a dataset, not a scrape target, so it chains on top of the commodity scrapers (Google Maps, Apollo, LinkedIn exports, etc.) instead of competing with them.

What it does

  1. Ingests mixed sources. Field names are auto-mapped, so company / organization / org_name all land on the same field. Combine inline records with the output dataset of a previous Actor run in one pass.
  2. Deduplicates / resolves entities. Merges the same person across sources — by exact email globally, then by company + fuzzy-matched name (so "Jon" and "John" at the same company collapse into one). Field values are merged with per-field provenance so you know where each value came from.
  3. Finds & verifies emails. Validates syntax, checks the domain's MX records via real DNS, and flags disposable and role-based addresses. When an email is missing, it generates ranked candidates from common corporate patterns (first.last@, flast@, …).
  4. Enriches firmographics. Resolves industry, detected tech stack, and mail capability from the company domain. Results are cached per domain, so overlapping lists and re-runs don't re-pay for the same work.
  5. Scores against your ICP. Assigns each lead a 0–100 score and an A/B/C/D tier based on industry, seniority, geography, required tech, and email quality — with the matched/missed criteria attached.

Why this design

The margin in enrichment is in three places, and this Actor is built around all three:

  • Canonical key before enrichment. The dedup key (domain for B2B, normalized email for people) is derived first, so you never pay to enrich the same entity five times under five spellings.
  • Caching. Firmographics are keyed on domain in a key-value store. Across overlapping lists, this is often the entire difference in unit economics.
  • Free signal first. MX/DNS checks, domain→company resolution, tech detection, and title-based seniority are all free. You can ship real enrichment before paying for a single API call.

Input

FieldTypeDescription
recordsarrayInline lead objects. Field names auto-mapped.
inputDatasetIdstringDataset ID (e.g. a scraper's output) to load and merge.
fieldMapobjectOverride/extend field aliasing ({ "src_field": "canonicalField" }).
dedupEnabledbooleanMerge duplicates / resolve entities. Default true.
fuzzyThresholdnumberJaro-Winkler threshold for same-company name matches. Default 0.92.
findMissingEmailsbooleanGuess emails from name + domain. Default true.
verifyEmailsbooleanSyntax + MX + disposable/role checks. Default true.
emailVerifierApiKeysecretOptional. Enables a mailbox-level verifier hook.
enrichFirmographicsbooleanIndustry, tech stack, MX. Default true.
fetchHomepagebooleanAllow homepage fetch for inference. Default true.
firmographicsApiKeysecretOptional. Enables a paid firmographics provider hook.
icpobjectScoring config (industries, seniority, countries, required tech, weights).
suppressionListarrayEmails/domains to drop (opt-outs, competitors).
maxRecordsintegerCap input after loading (0 = no limit).
maxConcurrencyintegerParallel enrichment workers. Default 10.
proxyConfigurationobjectProxy for homepage fetches (recommended on platform).

Example input

{
"inputDatasetId": "YOUR_SCRAPER_OUTPUT_DATASET_ID",
"records": [
{ "name": "Jane Doe", "title": "VP Engineering", "company": "Acme, Inc.", "website": "https://acme.com" }
],
"icp": {
"targetIndustries": ["software", "saas"],
"seniorityKeywords": ["vp", "head", "chief", "director", "founder"],
"targetCountries": ["United States"],
"requiredTech": ["HubSpot"]
},
"suppressionList": ["competitor.com", "optout@example.com"]
}

Output

Each dataset record is a flattened, enriched lead. Key fields:

  • Identity: fullName, firstName, lastName, jobTitle, companyName, domain, website, country, phone, linkedinUrl
  • Email: email, emailStatus (deliverable / risky / undeliverable / unknown), emailIsGuessed, emailIsRoleBased, emailIsDisposable, emailConfidence, emailCandidates
  • Firmographics: industry, techStack, firmographicsSource
  • Scoring: icpScore, icpTier, icpMatched, icpMissed
  • Provenance: mergedRecordCount, sources, provenance (field → source), extra (unmapped passthrough)

Records are sorted best-first by icpScore. A run-level SUMMARY (input/unique counts, tier and email-status breakdowns) is written to the default key-value store.

Limitations (by design)

  • Email verification is not mailbox-level without a provider hook. deliverable means valid syntax + a real MX record + not role/disposable — it does not confirm the specific mailbox exists. Guessed emails are always flagged emailIsGuessed: true and kept at low confidence.
  • Domain parsing is hostname-level, not eTLD+1 (no public-suffix list). This is reliable for the vast majority of B2B domains.
  • Free firmographics are inferred from DNS and homepage signals. Headcount/revenue require a paid provider via the hook.

Compliance

Lead data is personal data. This Actor carries a per-field source/provenance trail and supports a suppressionList for opt-outs from the first run. You are responsible for ensuring your use complies with GDPR, CCPA, and the terms of the sources you feed it. Do not process personal data without a lawful basis.