Pricing

Pay per usage

Lead Enrichment Pipeline — Email, Firmographics & ICP

Takes raw lead lists from any source and returns deduplicated, enriched, scored records. The value-add layer that sits on top of your scrapers.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Prooflio AI

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Lead Enrichment Pipeline — Dedup, Email, Firmographics & ICP Scoring

Stop re-scraping saturated sources. Be the value-add layer on top of them.

This Actor takes raw lead lists from any source — the output of another scraper, a messy CSV, a domain list — and returns deduplicated, enriched, and scored records. Its input is a dataset, not a scrape target, so it chains on top of the commodity scrapers (Google Maps, Apollo, LinkedIn exports, etc.) instead of competing with them.

What it does

Ingests mixed sources. Field names are auto-mapped, so company / organization / org_name all land on the same field. Combine inline records with the output dataset of a previous Actor run in one pass.
Deduplicates / resolves entities. Merges the same person across sources — by exact email globally, then by company + fuzzy-matched name (so "Jon" and "John" at the same company collapse into one). Field values are merged with per-field provenance so you know where each value came from.
Finds & verifies emails. Validates syntax, checks the domain's MX records via real DNS, and flags disposable and role-based addresses. When an email is missing, it generates ranked candidates from common corporate patterns (first.last@, flast@, …).
Enriches firmographics. Resolves industry, detected tech stack, and mail capability from the company domain. Results are cached per domain, so overlapping lists and re-runs don't re-pay for the same work.
Scores against your ICP. Assigns each lead a 0–100 score and an A/B/C/D tier based on industry, seniority, geography, required tech, and email quality — with the matched/missed criteria attached.

Why this design

The margin in enrichment is in three places, and this Actor is built around all three:

Canonical key before enrichment. The dedup key (domain for B2B, normalized email for people) is derived first, so you never pay to enrich the same entity five times under five spellings.
Caching. Firmographics are keyed on domain in a key-value store. Across overlapping lists, this is often the entire difference in unit economics.
Free signal first. MX/DNS checks, domain→company resolution, tech detection, and title-based seniority are all free. You can ship real enrichment before paying for a single API call.

Input

Field	Type	Description
`records`	array	Inline lead objects. Field names auto-mapped.
`inputDatasetId`	string	Dataset ID (e.g. a scraper's output) to load and merge.
`fieldMap`	object	Override/extend field aliasing (`{ "src_field": "canonicalField" }`).
`dedupEnabled`	boolean	Merge duplicates / resolve entities. Default `true`.
`fuzzyThreshold`	number	Jaro-Winkler threshold for same-company name matches. Default `0.92`.
`findMissingEmails`	boolean	Guess emails from name + domain. Default `true`.
`verifyEmails`	boolean	Syntax + MX + disposable/role checks. Default `true`.
`emailVerifierApiKey`	secret	Optional. Enables a mailbox-level verifier hook.
`enrichFirmographics`	boolean	Industry, tech stack, MX. Default `true`.
`fetchHomepage`	boolean	Allow homepage fetch for inference. Default `true`.
`firmographicsApiKey`	secret	Optional. Enables a paid firmographics provider hook.
`icp`	object	Scoring config (industries, seniority, countries, required tech, weights).
`suppressionList`	array	Emails/domains to drop (opt-outs, competitors).
`maxRecords`	integer	Cap input after loading (`0` = no limit).
`maxConcurrency`	integer	Parallel enrichment workers. Default `10`.
`proxyConfiguration`	object	Proxy for homepage fetches (recommended on platform).

Example input

{
  "inputDatasetId": "YOUR_SCRAPER_OUTPUT_DATASET_ID",
  "records": [
    { "name": "Jane Doe", "title": "VP Engineering", "company": "Acme, Inc.", "website": "https://acme.com" }
  ],
  "icp": {
    "targetIndustries": ["software", "saas"],
    "seniorityKeywords": ["vp", "head", "chief", "director", "founder"],
    "targetCountries": ["United States"],
    "requiredTech": ["HubSpot"]
  },
  "suppressionList": ["competitor.com", "optout@example.com"]
}

Output

Each dataset record is a flattened, enriched lead. Key fields:

Identity: fullName, firstName, lastName, jobTitle, companyName, domain, website, country, phone, linkedinUrl
Email: email, emailStatus (deliverable / risky / undeliverable / unknown), emailIsGuessed, emailIsRoleBased, emailIsDisposable, emailConfidence, emailCandidates
Firmographics: industry, techStack, firmographicsSource
Scoring: icpScore, icpTier, icpMatched, icpMissed
Provenance: mergedRecordCount, sources, provenance (field → source), extra (unmapped passthrough)

Records are sorted best-first by icpScore. A run-level SUMMARY (input/unique counts, tier and email-status breakdowns) is written to the default key-value store.

Limitations (by design)

Email verification is not mailbox-level without a provider hook. deliverable means valid syntax + a real MX record + not role/disposable — it does not confirm the specific mailbox exists. Guessed emails are always flagged emailIsGuessed: true and kept at low confidence.
Domain parsing is hostname-level, not eTLD+1 (no public-suffix list). This is reliable for the vast majority of B2B domains.
Free firmographics are inferred from DNS and homepage signals. Headcount/revenue require a paid provider via the hook.

Compliance

Lead data is personal data. This Actor carries a per-field source/provenance trail and supports a suppressionList for opt-outs from the first run. You are responsible for ensuring your use complies with GDPR, CCPA, and the terms of the sources you feed it. Do not process personal data without a lawful basis.

AI Lead Enrichment — ICP Scoring & Outreach

muhammadafzal/ai-lead-enrichment

Enrich companies and contacts with firmographics, role data, ICP-fit scores, source citations, confidence levels, and personalized outreach openers.

Muhammad Afzal

Lead Enrichment & Scoring - Email/Domain to ICP Fit

renzomacar/lead-enrichment

Renzo Madueno

Google Maps Lead Enricher — Emails & Lead Scores

ryanclinton/google-maps-lead-enricher

Turn a Google Maps search query into a fully enriched, scored lead list.

Ryan Clinton

LeadGraph (AI B2B Lead Generation & ICP Scoring)

antoniovfranco/leadgraph

AI-powered B2B lead generation with LangGraph ICP scoring, knowledge graph, and trigger signals. Scrapes LinkedIn, HackerNews, Google Maps and Apollo. Scores leads against your ICP using Groq (free) or OpenAI.

Antonio V. Franco

B2B Lead Generation — Intent-Scored & Verified

jurassic_jove/qualified-lead-finder

B2B lead generation that delivers call-ready, intent-scored leads — not raw data. Find businesses by vertical + location (Google Maps) or qualify your own list; detect buying-intent signals (running ads, online booking), verify the owner email, and score each lead 0–100.

Data Runner

🏢 Company Profile Data Lookup: Firmographics & Socials

dev00/company-profile-enrichment-api

Extract full corporate profiles, social links, email records, technology stacks, contact numbers, and industry firmographics for any domain.

dev00

Lead Enrichment

ironjellyfish/lead-enrichment

Produces canonical lead records with normalized source data and extracted contacts.

te wilson

B2B Website Lead Scoring & ICP Fit

jorokotev/public-website-lead-fit

B2B website lead scoring + company enrichment API for CRM enrichment, ICP fit, ABM/account research, Clay/Apollo/HubSpot/Salesforce, MCP/x402 AI agents. No contacts.

Joro Kotev

Lead Enrichment Pipeline, Company URL to Verified Contacts

george.the.developer/lead-enrichment-pipeline

One API call: provide a LinkedIn company URL, get back enriched employee profiles with validated emails. Combines LinkedIn scraping, company intelligence, and email discovery into a single pipeline.