Under maintenance

Pricing

Pay per event

Try for free

Go to Apify Store

GTM Leads Cleaner

Under maintenance

Try for free

Upload any lead CSV and get a CRM-ready dataset: email validation, name/company cleanup, job-title bucketing, and dedupe by email or domain+name.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Howard

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

GTM Leads Cleaner - CSV Lead Deduplication & Email Validation

What is GTM Leads Cleaner?

GTM Leads Cleaner is an Apify Actor that cleans, normalizes, and deduplicates GTM (Go-To-Market) lead data from CSV files. Built for sales teams, RevOps professionals, and marketers who need to prepare leads for CRM import with validated emails, standardized names, and categorized job titles.

See It In Action

🎬 Video demo coming soon!

Why Use GTM Leads Cleaner?

✅ Save hours of manual work - Process 10,000 leads in 2-3 minutes
✅ Improve CRM data quality - Validated emails, standardized names, clean formatting
✅ Better lead routing - GTM-focused job title categorization for accurate scoring
✅ Smart deduplication - Match by email or domain+name combination
✅ Pay only for what you use - Just $0.001 per lead processed

Use Cases

Clean CRM Exports Before Re-Import

Export your HubSpot, Salesforce, or Pipedrive contacts, run them through the cleaner, and re-import with normalized data and duplicates removed.

Deduplicate Leads from Multiple Sources

Combine leads from trade shows, webinars, content downloads, and scraped data into a single clean list without duplicates.

Prepare Sales Intelligence Exports

Clean exports from Apollo.io, ZoomInfo, or LinkedIn Sales Navigator before loading into your CRM or sales engagement platform.

Standardize Job Titles for Lead Scoring

Categorize job titles into consistent GTM buckets (Founder/C-level, Sales leadership, Marketing IC, etc.) for accurate lead scoring and routing.

Features

📧 Email Validation & Normalization - Trims whitespace, lowercases, validates format, and extracts first email from multi-email fields
👤 Name Processing - Splits full names into first/last, normalizes whitespace
🏢 Company Normalization - Cleans company names, removes extra whitespace
🌐 Domain Extraction - Derives domain from email or website column
🎯 Job Title Bucketing - Categorizes job titles into 10 GTM-focused buckets
🔄 Lead Deduplication - Finds duplicates by email or domain+name combination
🔍 Auto Column Detection - Automatically detects column mappings from various header formats
💰 Pay-Per-Event Pricing - Only pay for leads you actually process

How Much Does It Cost to Clean Leads?

The GTM Leads Cleaner uses Apify's pay-per-event pricing model:

Volume	Cost per Lead	Example
Any volume	$0.001	1,000 leads = $1.00

Cost Comparison

Method	Cost for 10,000 Leads	Time
GTM Leads Cleaner	~$10	2-3 minutes
Manual cleaning	$200-500 (VA time)	8-20 hours
Custom script	$0 + dev time	Hours to build

Typical run times:

1,000 rows: ~30 seconds
10,000 rows: ~2-3 minutes
100,000 rows: ~15-20 minutes

Tutorial: How to Clean Your Lead CSV

Step 1: Prepare Your CSV

Ensure your CSV file:

Is UTF-8 encoded
Has a header row
Contains at minimum an email column

Step 2: Upload Your File

You have three options:

File Upload - Use the file upload button in the Apify Console
URL - Provide a direct URL to your CSV file
Key-Value Store - Reference a file already in your Apify Key-Value Store

Step 3: Configure Options

{
  "inputFile": "leads.csv",
  "dedupeStrategy": "email",
  "outputFormat": "dataset",
  "includeDuplicates": false
}

Key options:

dedupeStrategy: Choose "email" for email-based matching or "domain+name" for fuzzy matching
outputFormat: "dataset" for API access or "csv" for downloadable file
includeDuplicates: Set to true if you want to see duplicate rows (marked with is_duplicate=true)

Step 4: Run and Download Results

Click "Start" to run the Actor
Wait for completion (check the "Runs" tab for progress)
Download results from the "Storage" tab:
- Dataset: Clean leads in JSON format
- Key-Value Store: cleaned_leads.csv (if CSV output enabled) and SUMMARY stats

Input Schema

Parameter	Type	Default	Description
`inputFile`	string	required	CSV file (upload, URL, or KV store key)
`dedupeStrategy`	enum	`"email"`	`"email"` or `"domain+name"`
`outputFormat`	enum	`"dataset"`	`"dataset"` or `"csv"`
`includeDuplicates`	boolean	`false`	Keep duplicate rows in output
`autoDetectPreference`	enum	`"first"`	Tie-breaking: `"first"`, `"last"`, or `"fail"`
`emailColumn`	string	auto	Manual email column override
`nameColumn`	string	auto	Manual name column override
`companyColumn`	string	auto	Manual company column override
`jobTitleColumn`	string	auto	Manual job title column override
`fieldMap`	object	`{}`	Programmatic column mapping (highest priority)

Deduplication Strategies

email - Matches on normalized email address. First occurrence is primary, subsequent matches are marked as duplicates.
domain+name - Matches on normalized full name + domain combination. Useful when the same person appears with different email addresses.

Auto-Detection Preferences

When multiple columns match a pattern (e.g., both "Email" and "Work Email"):

first - Uses the first matching column (leftmost in CSV)
last - Uses the last matching column (rightmost in CSV)
fail - Aborts with error listing candidates

Output Format

Dataset Output (default)

Each row is pushed to the Apify default dataset with canonical fields:

{
  "original_row_index": 1,
  "email": "JANE@ACME.COM",
  "normalized_email": "jane@acme.com",
  "email_is_valid": true,
  "full_name": "Jane Doe",
  "first_name": "Jane",
  "last_name": "Doe",
  "company": "Acme Inc",
  "domain": "acme.com",
  "role_raw": "Head of Growth",
  "role_bucket": "Marketing leadership",
  "is_duplicate": false,
  "duplicate_of_index": null,
  "dedupe_strategy_used": null,
  "source_file": "leads.csv",
  "error_message": null
}

CSV Export

When outputFormat: "csv", a cleaned_leads.csv file is written to the Key-Value Store with:

Canonical GTM fields (fixed order)
Original columns (preserved order)

Summary Statistics

A SUMMARY JSON is always written to the Key-Value Store:

{
  "total_rows": 1000,
  "processed_rows": 1000,
  "duplicate_rows": 50,
  "unique_leads": 950,
  "invalid_email_rows": 25,
  "input_file_name": "leads.csv",
  "dedupe_strategy": ["email"],
  "warnings": [],
  "created_at": "2024-01-15T10:30:00Z"
}

Job Title Buckets for Lead Categorization

Job titles are automatically categorized into GTM-focused buckets (9 defined + "Other" fallback):

Bucket	Example Keywords
Founder / C-level	founder, ceo, cto, cfo, chief, president, owner
RevOps / SalesOps	revops, revenue operations, sales operations, crm manager
Marketing leadership	head of marketing, vp marketing, marketing director, growth lead
Sales leadership	head of sales, vp sales, sales director, sales manager
Marketing IC	marketing specialist, demand gen specialist, content marketer
Sales IC	account executive, sdr, bdr, business development
Product	product manager, product owner, product lead
Engineering / Technical	engineer, developer, architect, devops
Customer Success	customer success, csm, account manager, onboarding
Other	(default fallback for unmatched titles)

CSV Column Auto-Detection

The Actor recognizes common header variations:

Field	Recognized Headers
Email	email, e-mail, work email, contact email
Full Name	name, full name, contact, person
First Name	first name, given name, first
Last Name	last name, surname, family name, last
Company	company, organization, org, employer
Job Title	title, job title, position, role
Domain	domain, website, url, company domain

Headers are matched case-insensitively.

Error Handling

Fatal Errors (Actor fails)

Invalid file format (not .csv)
UTF-8 decode failure
Missing required email column
Empty input file
Tie-breaking with "fail" preference when multiple candidates exist

Row-Level Errors

Rows with processing errors continue through the pipeline with:

error_message field set
email_is_valid set to false
Other fields populated where possible

Warnings

Non-fatal issues are logged and included in the summary:

High duplicate rate (>30%)
High invalid email rate (>20%)
Column detection ambiguities

Integrations & API Access

Zapier Integration

Use the "Apify" app in Zapier
Select "Run Actor" action
Choose "gtm-leads-cleaner" Actor
Map your CSV file URL to the inputFile parameter
Use "Get Dataset Items" to retrieve cleaned leads

Make.com (Integromat)

Add the Apify module to your scenario
Use "Run an Actor" action
Configure input with your CSV file
Use "Get Dataset Items" to retrieve results
Route cleaned leads to your CRM module

n8n Workflow

Use the Apify node
Set operation to "Run Actor"
Configure the Actor ID and input parameters
Use HTTP Request node to fetch dataset results
Connect to your CRM node (HubSpot, Salesforce, etc.)

Python SDK

from apify_client import ApifyClient

client = ApifyClient("your-api-token")
actor = client.actor("your-username/gtm-leads-cleaner")

run = actor.call(run_input={
    "inputFile": "https://example.com/leads.csv",
    "dedupeStrategy": "email",
    "outputFormat": "dataset"
})

# Get results
dataset = client.dataset(run["defaultDatasetId"])
for item in dataset.iterate_items():
    print(item["normalized_email"], item["is_duplicate"])

JavaScript / Node.js SDK

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'your-api-token' });

const run = await client.actor('your-username/gtm-leads-cleaner').call({
    inputFile: 'https://example.com/leads.csv',
    dedupeStrategy: 'email',
    outputFormat: 'dataset'
});

// Get results
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(item => {
    console.log(item.normalized_email, item.is_duplicate);
});

Direct API Call

curl -X POST "https://api.apify.com/v2/acts/your-username~gtm-leads-cleaner/runs?token=your-api-token" \
  -H "Content-Type: application/json" \
  -d '{
    "inputFile": "https://example.com/leads.csv",
    "dedupeStrategy": "email"
  }'

FAQ

What CSV formats are supported?

The Actor supports standard UTF-8 encoded CSV files with a header row. Files must have the .csv extension. The Actor handles various delimiters and quote characters automatically.

Can I use custom column mappings?

Yes! You have three options:

Individual overrides: Use emailColumn, nameColumn, companyColumn, or jobTitleColumn to specify exact header names
Field map: Use the fieldMap parameter for programmatic mapping of all fields at once
Auto-detection: Let the Actor detect columns automatically (works with most common header formats)

How does deduplication work?

The Actor supports two deduplication strategies:

Email-based: Compares normalized email addresses (lowercased, trimmed). First occurrence is kept as the primary record.
Domain+Name: Compares the combination of domain (from email or website) and normalized full name. Useful when the same person has multiple email addresses.

Duplicates are either filtered out (default) or marked with is_duplicate=true and duplicate_of_index pointing to the primary record (when includeDuplicates=true).

What happens to invalid emails?

Rows with invalid emails are still processed and included in the output. They are marked with:

email_is_valid: false
normalized_email: The original email (lowercased and trimmed)
All other fields are processed normally

You can filter these out in your downstream system or use the email_is_valid field for conditional logic.

Does it support pay-per-event pricing?

Yes! The Actor uses Apify's pay-per-event model. You're charged $0.001 per processed lead, meaning you only pay for what you use. The pricing appears as "Charged for X events" in your Apify billing.

Can I keep duplicate rows in the output?

Yes, set includeDuplicates: true in your input. Duplicates will be included but marked with is_duplicate: true and duplicate_of_index showing which record they duplicate.

What's the maximum file size?

There's no hard limit, but for optimal performance:

Files under 100MB process quickly
Larger files may require more memory (adjust in Actor settings)
For very large files (1M+ rows), consider splitting into chunks

Development

Local Development

# Install dependencies
uv sync

# Run tests
uv run pytest tests/ -v

# Run locally
apify run

Test Commands

# Run all tests
uv run pytest tests/ -v

# Run with coverage
uv run pytest tests/ --cov=src --cov-report=html

# Run specific test file
uv run pytest tests/test_integration.py -v

Looking for more data processing and lead generation tools? Check out these related Actors:

🔎 Google Maps Scraper - Extract business data from Google Maps
💼 LinkedIn Profile Scraper - Scrape LinkedIn profiles for lead enrichment
📊 CSV to JSON Converter - Convert CSV files to JSON format

Resources

License

Apache 2.0

Bulk Email Finder Tool – Find Business Emails by Name & Domain

davidsharadbhatt/bulk-email-finder-tool---find-business-emails-by-name-domain

Instantly find verified professional email addresses by entering a contact’s first name, last name, and company domain—ideal for sales teams, outreach campaigns, and lead generation.

David Bhatt

Fast Email Finder

dxbear/fast-email-finder

🔥 Blazing Fast Email Finder Instantly Scrape ⚡public emails for any company domain 🌐 or using first name, last name 👤, and domain. Perfect for cold email outreach 📧 and lead generation 📈 — find high-converting leads 🚀, and scale your campaigns 🌡️ with direct inbox access.

Dxbear

571

4.8

Email Finder-Find Verified Emails by Name & Company Domain

davidsharadbhatt/email-finder-find-verified-emails-by-name-and-company-domain

Find professional emails instantly. Enter first name, last name, and company domain to get verified business emails for leads and outreach.

David Bhatt

Bulk Email Finder

unlimitedleadtestinbox/bulk-email-finder

Find any professional verified email from the name , last name and company website

unli

5.0

Bulk Email Finder Tool – Find Emails by Name & Company Domain

davidsharadbhatt/bulk-email-finder-tool---find-emails-by-name-company-domain

Find professional emails fast. Enter first name, last name, and company domain to get verified business emails for sales, outreach and lead generation.

David Bhatt

Email Finder By Name Company Domain_new

davidsharadbhatt/email-finder-by-name-company-domain-new

Find professional emails fast. Enter first name, last name, and company domain to get verified business emails for sales, outreach and lead generation.

David Bhatt

Bulk email finder

unlimited-leads/bulkemailfinder

Bulk email finder powered by UnlimitedLeads. Upload a list of prospects (up to 500) with their first name, last name and company domain to find their professional email addresses. Supports CSV. Returns verified emails. Perfect for sales teams and recruiters.

Eliasse Hamour

180

1.4

Lead Finder

crawlerbros/lead-finder

Find B2B leads by job title, location, and industry. Extracts name, title, company, LinkedIn URL, and guessed email from public search results. No API keys or credentials needed.

Crawler Bros

5.0

Email Finder

tomba-io/email-finder

Email Finder tool helps you discover verified professional email addresses using just a person's name and their company domain.

Tomba io

5.0

Clean Email Verifier

corner_cutter/clean-email-verifier

Professional email validation tool that checks deliverability and identifies risky or invalid addresses. Validate bulk email lists with detailed status reports. Ideal for lead validation, CRM cleaning, and email marketing campaigns.

ReactRuby

GTM Leads Cleaner

GTM Leads Cleaner - CSV Lead Deduplication & Email Validation

What is GTM Leads Cleaner?

See It In Action

Why Use GTM Leads Cleaner?

Use Cases

Clean CRM Exports Before Re-Import

Deduplicate Leads from Multiple Sources

Prepare Sales Intelligence Exports

Standardize Job Titles for Lead Scoring

Features

How Much Does It Cost to Clean Leads?

Cost Comparison

Tutorial: How to Clean Your Lead CSV

Step 1: Prepare Your CSV

Step 2: Upload Your File

Step 3: Configure Options

Step 4: Run and Download Results

Input Schema

Deduplication Strategies

Auto-Detection Preferences

Output Format

Dataset Output (default)

CSV Export

Summary Statistics

Job Title Buckets for Lead Categorization

CSV Column Auto-Detection

Error Handling

Fatal Errors (Actor fails)

Row-Level Errors

Warnings

Integrations & API Access

Zapier Integration

Make.com (Integromat)

n8n Workflow

Python SDK

JavaScript / Node.js SDK

Direct API Call

FAQ

What CSV formats are supported?

Can I use custom column mappings?

How does deduplication work?

What happens to invalid emails?

Does it support pay-per-event pricing?

Can I keep duplicate rows in the output?

What's the maximum file size?

Development

Local Development

Test Commands

Related Apify Actors

Resources

License

You might also like

Bulk Email Finder Tool – Find Business Emails by Name & Domain

Fast Email Finder

Email Finder-Find Verified Emails by Name & Company Domain

Bulk Email Finder

Bulk Email Finder Tool – Find Emails by Name & Company Domain

Email Finder By Name Company Domain_new

Bulk email finder

Lead Finder

Email Finder

Clean Email Verifier

Related articles