Lead List Deduplicator & Normalizer avatar

Lead List Deduplicator & Normalizer

Pricing

from $0.05 / 1,000 results

Go to Apify Store
Lead List Deduplicator & Normalizer

Lead List Deduplicator & Normalizer

[💵 $0.05 / 1K] Clean messy B2B lead lists into CRM-ready company/contact records with duplicate clusters, confidence scores, match reasons, normalized domains, emails, and phones.

Pricing

from $0.05 / 1,000 results

Rating

0.0

(0)

Developer

Open Web Team

Open Web Team

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

12 hours ago

Last modified

Share

Lead List Deduplicator & Normalizer - CRM-Ready Leads, Not Messy Dumps

Turn messy scraped B2B lead lists into canonical, CRM-ready company and contact records, not duplicate-filled dumps.

This Actor takes inline JSON records or an Apify dataset ID, normalizes common lead fields, groups duplicates, and outputs one canonical row per lead/company cluster with confidence scores, match reasons, source row IDs, and warnings.

Use it after Google Maps scrapers, directory scrapers, website contact scrapers, exhibitor-list scrapers, Apollo-style lead exports, or any workflow where several sources produce overlapping leads.

Launch Pricing

Launch pricing is currently $0.05 per 1,000 cleaned rows.

The launch version supports up to 5,000 input records per run. Larger datasets can be processed in batches while high-volume matching is optimized.

Quick Preview

Messy inputClean output
Acme Inc, ACME LLC, https://www.acme.com, sales@acme.comone canonical acme.com cluster
duplicate domains, emails, phones, or similar company namesclusterId, clusterSize, mergeConfidence, matchReasons
records from multiple Apify datasets or CSV/JSON importsCRM-ready rows with normalized company, domain, email, and phone

Why Use This Actor

  • Merge overlapping exports from multiple scrapers.
  • Remove duplicate companies, domains, emails, and phone numbers before CRM import.
  • Normalize company names, domains, emails, and phones.
  • Keep source row IDs so every merge is auditable.
  • Get confidence scores and match reasons instead of a black-box cleanup.
  • Use deterministic rules first, so costs stay predictable.
  • No browser, proxies, or external enrichment APIs.

Common Use Cases

  • Merge lead lists from several Apify scrapers.
  • Clean a CSV before importing it into HubSpot, Pipedrive, Salesforce, Clay, Instantly, Smartlead, or Airtable.
  • Remove duplicate outreach targets before spending credits on email verification or enrichment.
  • Create a canonical company list from multiple scraped directories.
  • Audit which rows were merged and why.

Input Example

{
"dedupMode": "balanced",
"records": [
{
"id": "1",
"company": "Acme Inc",
"website": "https://www.acme.com",
"email": "sales@acme.com"
},
{
"id": "2",
"companyName": "ACME LLC",
"domain": "acme.com",
"phone": "(415) 555-2671"
}
]
}

You can also provide an Apify datasetId instead of inline records.

If no input is provided, the Actor runs with sample records so you can test the output immediately.

Output Example

{
"recordType": "canonicalLead",
"clusterId": "cluster_0001",
"clusterSize": 2,
"mergeDecision": "merged",
"mergeConfidence": 0.9,
"matchReasons": ["same_domain", "similar_company"],
"sourceRowIds": ["1", "2"],
"canonicalCompanyName": "Acme Inc",
"normalizedCompanyName": "acme",
"normalizedDomain": "acme.com",
"normalizedEmail": "sales@acme.com",
"normalizedPhone": "4155552671",
"warnings": []
}

Deduplication Modes

ModeBest forBehavior
conservativeAvoiding false mergesRequires exact email, phone, or domain match
balancedMost lead listsUses exact email/phone/domain plus strong company-name similarity
aggressiveVery messy listsUses looser company-name matching; review warnings before importing

Dataset Views

ViewBest for
CanonicalCRM-ready rows after deduplication
Duplicate clustersAuditing source rows, match reasons, and confidence

Output Fields

FieldMeaning
clusterIdStable cluster identifier for the canonical row
clusterSizeNumber of source rows merged into the canonical row
mergeDecisionunique, merged, or ambiguous
mergeConfidenceConfidence score from 0 to 1
matchReasonsWhy records matched, such as same_email, same_domain, or similar_company
sourceRowIdsOriginal row IDs or indexes used in the merge
normalizedDomainClean domain value such as acme.com
warningsFlags such as low_confidence_merge or missing_domain_or_email

Limits and Caveats

  • This MVP uses deterministic rules and fuzzy string similarity, not paid LLM adjudication.
  • Review ambiguous rows before importing them into a CRM.
  • Email/phone/domain normalization is conservative and may not cover every country-specific format.
  • The Actor does not scrape or enrich missing contact data; it cleans the records you provide.
  • It does not verify email deliverability or MX records in the first version.
  • Current runs are capped at 5,000 input records while the deduplication engine is optimized for larger files.

Pricing

This Actor is designed for pay-per-row pricing. You pay for cleaned output rows plus Apify platform usage.

Because it does not launch a browser or call external enrichment APIs, runs should stay inexpensive for bulk cleanup.