Lead Enrichment Pipeline — Email, Firmographics & ICP
Pricing
Pay per usage
Lead Enrichment Pipeline — Email, Firmographics & ICP
Takes raw lead lists from any source and returns deduplicated, enriched, scored records. The value-add layer that sits on top of your scrapers.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Prooflio AI
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
Lead Enrichment Pipeline — Dedup, Email, Firmographics & ICP Scoring
Stop re-scraping saturated sources. Be the value-add layer on top of them.
This Actor takes raw lead lists from any source — the output of another scraper, a messy CSV, a domain list — and returns deduplicated, enriched, and scored records. Its input is a dataset, not a scrape target, so it chains on top of the commodity scrapers (Google Maps, Apollo, LinkedIn exports, etc.) instead of competing with them.
What it does
- Ingests mixed sources. Field names are auto-mapped, so
company/organization/org_nameall land on the same field. Combine inline records with the output dataset of a previous Actor run in one pass. - Deduplicates / resolves entities. Merges the same person across sources — by exact email globally, then by company + fuzzy-matched name (so "Jon" and "John" at the same company collapse into one). Field values are merged with per-field provenance so you know where each value came from.
- Finds & verifies emails. Validates syntax, checks the domain's MX records via real DNS, and flags disposable and role-based addresses. When an email is missing, it generates ranked candidates from common corporate patterns (
first.last@,flast@, …). - Enriches firmographics. Resolves industry, detected tech stack, and mail capability from the company domain. Results are cached per domain, so overlapping lists and re-runs don't re-pay for the same work.
- Scores against your ICP. Assigns each lead a 0–100 score and an A/B/C/D tier based on industry, seniority, geography, required tech, and email quality — with the matched/missed criteria attached.
Why this design
The margin in enrichment is in three places, and this Actor is built around all three:
- Canonical key before enrichment. The dedup key (domain for B2B, normalized email for people) is derived first, so you never pay to enrich the same entity five times under five spellings.
- Caching. Firmographics are keyed on domain in a key-value store. Across overlapping lists, this is often the entire difference in unit economics.
- Free signal first. MX/DNS checks, domain→company resolution, tech detection, and title-based seniority are all free. You can ship real enrichment before paying for a single API call.
Input
| Field | Type | Description |
|---|---|---|
records | array | Inline lead objects. Field names auto-mapped. |
inputDatasetId | string | Dataset ID (e.g. a scraper's output) to load and merge. |
fieldMap | object | Override/extend field aliasing ({ "src_field": "canonicalField" }). |
dedupEnabled | boolean | Merge duplicates / resolve entities. Default true. |
fuzzyThreshold | number | Jaro-Winkler threshold for same-company name matches. Default 0.92. |
findMissingEmails | boolean | Guess emails from name + domain. Default true. |
verifyEmails | boolean | Syntax + MX + disposable/role checks. Default true. |
emailVerifierApiKey | secret | Optional. Enables a mailbox-level verifier hook. |
enrichFirmographics | boolean | Industry, tech stack, MX. Default true. |
fetchHomepage | boolean | Allow homepage fetch for inference. Default true. |
firmographicsApiKey | secret | Optional. Enables a paid firmographics provider hook. |
icp | object | Scoring config (industries, seniority, countries, required tech, weights). |
suppressionList | array | Emails/domains to drop (opt-outs, competitors). |
maxRecords | integer | Cap input after loading (0 = no limit). |
maxConcurrency | integer | Parallel enrichment workers. Default 10. |
proxyConfiguration | object | Proxy for homepage fetches (recommended on platform). |
Example input
{"inputDatasetId": "YOUR_SCRAPER_OUTPUT_DATASET_ID","records": [{ "name": "Jane Doe", "title": "VP Engineering", "company": "Acme, Inc.", "website": "https://acme.com" }],"icp": {"targetIndustries": ["software", "saas"],"seniorityKeywords": ["vp", "head", "chief", "director", "founder"],"targetCountries": ["United States"],"requiredTech": ["HubSpot"]},"suppressionList": ["competitor.com", "optout@example.com"]}
Output
Each dataset record is a flattened, enriched lead. Key fields:
- Identity:
fullName,firstName,lastName,jobTitle,companyName,domain,website,country,phone,linkedinUrl - Email:
email,emailStatus(deliverable/risky/undeliverable/unknown),emailIsGuessed,emailIsRoleBased,emailIsDisposable,emailConfidence,emailCandidates - Firmographics:
industry,techStack,firmographicsSource - Scoring:
icpScore,icpTier,icpMatched,icpMissed - Provenance:
mergedRecordCount,sources,provenance(field → source),extra(unmapped passthrough)
Records are sorted best-first by icpScore. A run-level SUMMARY (input/unique counts, tier and email-status breakdowns) is written to the default key-value store.
Limitations (by design)
- Email verification is not mailbox-level without a provider hook.
deliverablemeans valid syntax + a real MX record + not role/disposable — it does not confirm the specific mailbox exists. Guessed emails are always flaggedemailIsGuessed: trueand kept at low confidence. - Domain parsing is hostname-level, not eTLD+1 (no public-suffix list). This is reliable for the vast majority of B2B domains.
- Free firmographics are inferred from DNS and homepage signals. Headcount/revenue require a paid provider via the hook.
Compliance
Lead data is personal data. This Actor carries a per-field source/provenance trail and supports a suppressionList for opt-outs from the first run. You are responsible for ensuring your use complies with GDPR, CCPA, and the terms of the sources you feed it. Do not process personal data without a lawful basis.


