Lead List Deduplicator & Normalizer
Pricing
from $0.05 / 1,000 results
Lead List Deduplicator & Normalizer
[💵 $0.05 / 1K] Clean messy B2B lead lists into CRM-ready company/contact records with duplicate clusters, confidence scores, match reasons, normalized domains, emails, and phones.
Pricing
from $0.05 / 1,000 results
Rating
0.0
(0)
Developer
Open Web Team
Maintained by CommunityActor stats
1
Bookmarked
2
Total users
1
Monthly active users
12 hours ago
Last modified
Categories
Share
Lead List Deduplicator & Normalizer - CRM-Ready Leads, Not Messy Dumps
Turn messy scraped B2B lead lists into canonical, CRM-ready company and contact records, not duplicate-filled dumps.
This Actor takes inline JSON records or an Apify dataset ID, normalizes common lead fields, groups duplicates, and outputs one canonical row per lead/company cluster with confidence scores, match reasons, source row IDs, and warnings.
Use it after Google Maps scrapers, directory scrapers, website contact scrapers, exhibitor-list scrapers, Apollo-style lead exports, or any workflow where several sources produce overlapping leads.
Launch Pricing
Launch pricing is currently $0.05 per 1,000 cleaned rows.
The launch version supports up to 5,000 input records per run. Larger datasets can be processed in batches while high-volume matching is optimized.
Quick Preview
| Messy input | Clean output |
|---|---|
Acme Inc, ACME LLC, https://www.acme.com, sales@acme.com | one canonical acme.com cluster |
| duplicate domains, emails, phones, or similar company names | clusterId, clusterSize, mergeConfidence, matchReasons |
| records from multiple Apify datasets or CSV/JSON imports | CRM-ready rows with normalized company, domain, email, and phone |
Why Use This Actor
- Merge overlapping exports from multiple scrapers.
- Remove duplicate companies, domains, emails, and phone numbers before CRM import.
- Normalize company names, domains, emails, and phones.
- Keep source row IDs so every merge is auditable.
- Get confidence scores and match reasons instead of a black-box cleanup.
- Use deterministic rules first, so costs stay predictable.
- No browser, proxies, or external enrichment APIs.
Common Use Cases
- Merge lead lists from several Apify scrapers.
- Clean a CSV before importing it into HubSpot, Pipedrive, Salesforce, Clay, Instantly, Smartlead, or Airtable.
- Remove duplicate outreach targets before spending credits on email verification or enrichment.
- Create a canonical company list from multiple scraped directories.
- Audit which rows were merged and why.
Input Example
{"dedupMode": "balanced","records": [{"id": "1","company": "Acme Inc","website": "https://www.acme.com","email": "sales@acme.com"},{"id": "2","companyName": "ACME LLC","domain": "acme.com","phone": "(415) 555-2671"}]}
You can also provide an Apify datasetId instead of inline records.
If no input is provided, the Actor runs with sample records so you can test the output immediately.
Output Example
{"recordType": "canonicalLead","clusterId": "cluster_0001","clusterSize": 2,"mergeDecision": "merged","mergeConfidence": 0.9,"matchReasons": ["same_domain", "similar_company"],"sourceRowIds": ["1", "2"],"canonicalCompanyName": "Acme Inc","normalizedCompanyName": "acme","normalizedDomain": "acme.com","normalizedEmail": "sales@acme.com","normalizedPhone": "4155552671","warnings": []}
Deduplication Modes
| Mode | Best for | Behavior |
|---|---|---|
conservative | Avoiding false merges | Requires exact email, phone, or domain match |
balanced | Most lead lists | Uses exact email/phone/domain plus strong company-name similarity |
aggressive | Very messy lists | Uses looser company-name matching; review warnings before importing |
Dataset Views
| View | Best for |
|---|---|
Canonical | CRM-ready rows after deduplication |
Duplicate clusters | Auditing source rows, match reasons, and confidence |
Output Fields
| Field | Meaning |
|---|---|
clusterId | Stable cluster identifier for the canonical row |
clusterSize | Number of source rows merged into the canonical row |
mergeDecision | unique, merged, or ambiguous |
mergeConfidence | Confidence score from 0 to 1 |
matchReasons | Why records matched, such as same_email, same_domain, or similar_company |
sourceRowIds | Original row IDs or indexes used in the merge |
normalizedDomain | Clean domain value such as acme.com |
warnings | Flags such as low_confidence_merge or missing_domain_or_email |
Limits and Caveats
- This MVP uses deterministic rules and fuzzy string similarity, not paid LLM adjudication.
- Review
ambiguousrows before importing them into a CRM. - Email/phone/domain normalization is conservative and may not cover every country-specific format.
- The Actor does not scrape or enrich missing contact data; it cleans the records you provide.
- It does not verify email deliverability or MX records in the first version.
- Current runs are capped at 5,000 input records while the deduplication engine is optimized for larger files.
Pricing
This Actor is designed for pay-per-row pricing. You pay for cleaned output rows plus Apify platform usage.
Because it does not launch a browser or call external enrichment APIs, runs should stay inexpensive for bulk cleanup.