GTM Leads Cleaner
Pricing
Pay per event
GTM Leads Cleaner
Upload any lead CSV and get a CRM-ready dataset: email validation, name/company cleanup, job-title bucketing, and dedupe by email or domain+name.
Pricing
Pay per event
Rating
0.0
(0)
Developer

Howard
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
10 days ago
Last modified
Categories
Share
GTM Leads Cleaner - CSV Lead Deduplication & Email Validation
What is GTM Leads Cleaner?
GTM Leads Cleaner is an Apify Actor that cleans, normalizes, and deduplicates GTM (Go-To-Market) lead data from CSV files. Built for sales teams, RevOps professionals, and marketers who need to prepare leads for CRM import with validated emails, standardized names, and categorized job titles.
See It In Action
π¬ Video demo coming soon!
Why Use GTM Leads Cleaner?
- β Save hours of manual work - Process 10,000 leads in 2-3 minutes
- β Improve CRM data quality - Validated emails, standardized names, clean formatting
- β Better lead routing - GTM-focused job title categorization for accurate scoring
- β Smart deduplication - Match by email or domain+name combination
- β Pay only for what you use - Just $0.001 per lead processed
Use Cases
Clean CRM Exports Before Re-Import
Export your HubSpot, Salesforce, or Pipedrive contacts, run them through the cleaner, and re-import with normalized data and duplicates removed.
Deduplicate Leads from Multiple Sources
Combine leads from trade shows, webinars, content downloads, and scraped data into a single clean list without duplicates.
Prepare Sales Intelligence Exports
Clean exports from Apollo.io, ZoomInfo, or LinkedIn Sales Navigator before loading into your CRM or sales engagement platform.
Standardize Job Titles for Lead Scoring
Categorize job titles into consistent GTM buckets (Founder/C-level, Sales leadership, Marketing IC, etc.) for accurate lead scoring and routing.
Features
- π§ Email Validation & Normalization - Trims whitespace, lowercases, validates format, and extracts first email from multi-email fields
- π€ Name Processing - Splits full names into first/last, normalizes whitespace
- π’ Company Normalization - Cleans company names, removes extra whitespace
- π Domain Extraction - Derives domain from email or website column
- π― Job Title Bucketing - Categorizes job titles into 10 GTM-focused buckets
- π Lead Deduplication - Finds duplicates by email or domain+name combination
- π Auto Column Detection - Automatically detects column mappings from various header formats
- π° Pay-Per-Event Pricing - Only pay for leads you actually process
How Much Does It Cost to Clean Leads?
The GTM Leads Cleaner uses Apify's pay-per-event pricing model:
| Volume | Cost per Lead | Example |
|---|---|---|
| Any volume | $0.001 | 1,000 leads = $1.00 |
Cost Comparison
| Method | Cost for 10,000 Leads | Time |
|---|---|---|
| GTM Leads Cleaner | ~$10 | 2-3 minutes |
| Manual cleaning | $200-500 (VA time) | 8-20 hours |
| Custom script | $0 + dev time | Hours to build |
Typical run times:
- 1,000 rows: ~30 seconds
- 10,000 rows: ~2-3 minutes
- 100,000 rows: ~15-20 minutes
Tutorial: How to Clean Your Lead CSV
Step 1: Prepare Your CSV
Ensure your CSV file:
- Is UTF-8 encoded
- Has a header row
- Contains at minimum an email column
Step 2: Upload Your File
You have three options:
- File Upload - Use the file upload button in the Apify Console
- URL - Provide a direct URL to your CSV file
- Key-Value Store - Reference a file already in your Apify Key-Value Store
Step 3: Configure Options
{"inputFile": "leads.csv","dedupeStrategy": "email","outputFormat": "dataset","includeDuplicates": false}
Key options:
dedupeStrategy: Choose"email"for email-based matching or"domain+name"for fuzzy matchingoutputFormat:"dataset"for API access or"csv"for downloadable fileincludeDuplicates: Set totrueif you want to see duplicate rows (marked withis_duplicate=true)
Step 4: Run and Download Results
- Click "Start" to run the Actor
- Wait for completion (check the "Runs" tab for progress)
- Download results from the "Storage" tab:
- Dataset: Clean leads in JSON format
- Key-Value Store:
cleaned_leads.csv(if CSV output enabled) andSUMMARYstats
Input Schema
| Parameter | Type | Default | Description |
|---|---|---|---|
inputFile | string | required | CSV file (upload, URL, or KV store key) |
dedupeStrategy | enum | "email" | "email" or "domain+name" |
outputFormat | enum | "dataset" | "dataset" or "csv" |
includeDuplicates | boolean | false | Keep duplicate rows in output |
autoDetectPreference | enum | "first" | Tie-breaking: "first", "last", or "fail" |
emailColumn | string | auto | Manual email column override |
nameColumn | string | auto | Manual name column override |
companyColumn | string | auto | Manual company column override |
jobTitleColumn | string | auto | Manual job title column override |
fieldMap | object | {} | Programmatic column mapping (highest priority) |
Deduplication Strategies
- email - Matches on normalized email address. First occurrence is primary, subsequent matches are marked as duplicates.
- domain+name - Matches on normalized full name + domain combination. Useful when the same person appears with different email addresses.
Auto-Detection Preferences
When multiple columns match a pattern (e.g., both "Email" and "Work Email"):
- first - Uses the first matching column (leftmost in CSV)
- last - Uses the last matching column (rightmost in CSV)
- fail - Aborts with error listing candidates
Output Format
Dataset Output (default)
Each row is pushed to the Apify default dataset with canonical fields:
{"original_row_index": 1,"email": "JANE@ACME.COM","normalized_email": "jane@acme.com","email_is_valid": true,"full_name": "Jane Doe","first_name": "Jane","last_name": "Doe","company": "Acme Inc","domain": "acme.com","role_raw": "Head of Growth","role_bucket": "Marketing leadership","is_duplicate": false,"duplicate_of_index": null,"dedupe_strategy_used": null,"source_file": "leads.csv","error_message": null}
CSV Export
When outputFormat: "csv", a cleaned_leads.csv file is written to the Key-Value Store with:
- Canonical GTM fields (fixed order)
- Original columns (preserved order)
Summary Statistics
A SUMMARY JSON is always written to the Key-Value Store:
{"total_rows": 1000,"processed_rows": 1000,"duplicate_rows": 50,"unique_leads": 950,"invalid_email_rows": 25,"input_file_name": "leads.csv","dedupe_strategy": ["email"],"warnings": [],"created_at": "2024-01-15T10:30:00Z"}
Job Title Buckets for Lead Categorization
Job titles are automatically categorized into GTM-focused buckets (9 defined + "Other" fallback):
| Bucket | Example Keywords |
|---|---|
| Founder / C-level | founder, ceo, cto, cfo, chief, president, owner |
| RevOps / SalesOps | revops, revenue operations, sales operations, crm manager |
| Marketing leadership | head of marketing, vp marketing, marketing director, growth lead |
| Sales leadership | head of sales, vp sales, sales director, sales manager |
| Marketing IC | marketing specialist, demand gen specialist, content marketer |
| Sales IC | account executive, sdr, bdr, business development |
| Product | product manager, product owner, product lead |
| Engineering / Technical | engineer, developer, architect, devops |
| Customer Success | customer success, csm, account manager, onboarding |
| Other | (default fallback for unmatched titles) |
CSV Column Auto-Detection
The Actor recognizes common header variations:
| Field | Recognized Headers |
|---|---|
| email, e-mail, work email, contact email | |
| Full Name | name, full name, contact, person |
| First Name | first name, given name, first |
| Last Name | last name, surname, family name, last |
| Company | company, organization, org, employer |
| Job Title | title, job title, position, role |
| Domain | domain, website, url, company domain |
Headers are matched case-insensitively.
Error Handling
Fatal Errors (Actor fails)
- Invalid file format (not
.csv) - UTF-8 decode failure
- Missing required email column
- Empty input file
- Tie-breaking with "fail" preference when multiple candidates exist
Row-Level Errors
Rows with processing errors continue through the pipeline with:
error_messagefield setemail_is_validset to false- Other fields populated where possible
Warnings
Non-fatal issues are logged and included in the summary:
- High duplicate rate (>30%)
- High invalid email rate (>20%)
- Column detection ambiguities
Integrations & API Access
Zapier Integration
- Use the "Apify" app in Zapier
- Select "Run Actor" action
- Choose "gtm-leads-cleaner" Actor
- Map your CSV file URL to the
inputFileparameter - Use "Get Dataset Items" to retrieve cleaned leads
Make.com (Integromat)
- Add the Apify module to your scenario
- Use "Run an Actor" action
- Configure input with your CSV file
- Use "Get Dataset Items" to retrieve results
- Route cleaned leads to your CRM module
n8n Workflow
- Use the Apify node
- Set operation to "Run Actor"
- Configure the Actor ID and input parameters
- Use HTTP Request node to fetch dataset results
- Connect to your CRM node (HubSpot, Salesforce, etc.)
Python SDK
from apify_client import ApifyClientclient = ApifyClient("your-api-token")actor = client.actor("your-username/gtm-leads-cleaner")run = actor.call(run_input={"inputFile": "https://example.com/leads.csv","dedupeStrategy": "email","outputFormat": "dataset"})# Get resultsdataset = client.dataset(run["defaultDatasetId"])for item in dataset.iterate_items():print(item["normalized_email"], item["is_duplicate"])
JavaScript / Node.js SDK
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'your-api-token' });const run = await client.actor('your-username/gtm-leads-cleaner').call({inputFile: 'https://example.com/leads.csv',dedupeStrategy: 'email',outputFormat: 'dataset'});// Get resultsconst { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach(item => {console.log(item.normalized_email, item.is_duplicate);});
Direct API Call
curl -X POST "https://api.apify.com/v2/acts/your-username~gtm-leads-cleaner/runs?token=your-api-token" \-H "Content-Type: application/json" \-d '{"inputFile": "https://example.com/leads.csv","dedupeStrategy": "email"}'
FAQ
What CSV formats are supported?
The Actor supports standard UTF-8 encoded CSV files with a header row. Files must have the .csv extension. The Actor handles various delimiters and quote characters automatically.
Can I use custom column mappings?
Yes! You have three options:
- Individual overrides: Use
emailColumn,nameColumn,companyColumn, orjobTitleColumnto specify exact header names - Field map: Use the
fieldMapparameter for programmatic mapping of all fields at once - Auto-detection: Let the Actor detect columns automatically (works with most common header formats)
How does deduplication work?
The Actor supports two deduplication strategies:
- Email-based: Compares normalized email addresses (lowercased, trimmed). First occurrence is kept as the primary record.
- Domain+Name: Compares the combination of domain (from email or website) and normalized full name. Useful when the same person has multiple email addresses.
Duplicates are either filtered out (default) or marked with is_duplicate=true and duplicate_of_index pointing to the primary record (when includeDuplicates=true).
What happens to invalid emails?
Rows with invalid emails are still processed and included in the output. They are marked with:
email_is_valid: falsenormalized_email: The original email (lowercased and trimmed)- All other fields are processed normally
You can filter these out in your downstream system or use the email_is_valid field for conditional logic.
Does it support pay-per-event pricing?
Yes! The Actor uses Apify's pay-per-event model. You're charged $0.001 per processed lead, meaning you only pay for what you use. The pricing appears as "Charged for X events" in your Apify billing.
Can I keep duplicate rows in the output?
Yes, set includeDuplicates: true in your input. Duplicates will be included but marked with is_duplicate: true and duplicate_of_index showing which record they duplicate.
What's the maximum file size?
There's no hard limit, but for optimal performance:
- Files under 100MB process quickly
- Larger files may require more memory (adjust in Actor settings)
- For very large files (1M+ rows), consider splitting into chunks
Development
Local Development
# Install dependenciesuv sync# Run testsuv run pytest tests/ -v# Run locallyapify run
Test Commands
# Run all testsuv run pytest tests/ -v# Run with coverageuv run pytest tests/ --cov=src --cov-report=html# Run specific test fileuv run pytest tests/test_integration.py -v
Related Apify Actors
Looking for more data processing and lead generation tools? Check out these related Actors:
- π Google Maps Scraper - Extract business data from Google Maps
- πΌ LinkedIn Profile Scraper - Scrape LinkedIn profiles for lead enrichment
- π CSV to JSON Converter - Convert CSV files to JSON format
Resources
License
Apache 2.0

