📊 CSV Lead List Cleaner
Pricing
Pay per event
📊 CSV Lead List Cleaner
Deduplicate and clean CSV export files before importing to your CRM. Remove empty rows, trim whitespace, and sort extracted contact details.
Pricing
Pay per event
Rating
0.0
(0)
Developer
太郎 山田
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
12 hours ago
Last modified
Categories
Share
🧹 CSV Data Cleaner
Clean CSV data: trim whitespace, remove empty rows, deduplicate by columns, sort. Pure JavaScript, zero external dependencies, zero API keys.
Store Quickstart
Start with the Quickstart template (direct CSV URL). For Apify pipelines, use Pipeline Cleaner with datasetId.
Key Features
- 🧹 Trim whitespace — Remove leading/trailing spaces from all cells
- 🗑️ Remove empty rows — Drop rows where all columns are empty
- 🔁 Deduplicate by columns — Remove duplicate rows by specified key columns
- 📊 Sort by column — Output sorted by any column
- 🔗 Dataset or URL input — Apify dataset ID or direct CSV URL
- 🔑 No API key needed — Pure JS, zero dependencies
Use Cases
| Who | Why |
|---|---|
| Data engineers | Clean scraper outputs before downstream processing |
| BI analysts | Standardize CSV imports from multiple sources |
| Marketing ops | Clean lead list CSVs before CRM upload |
| Data migration | Normalize CSV files during system migrations |
| Apify pipelines | Post-process actor output datasets |
Input
| Field | Type | Default | Description |
|---|---|---|---|
| csvUrl | string | Direct CSV URL (or use datasetId) | |
| datasetId | string | Apify dataset ID (or use csvUrl) | |
| dedupColumns | string[] | [] | Columns for dedup key |
| trimWhitespace | boolean | true | Trim whitespace |
| removeEmpty | boolean | true | Remove empty rows |
| sortBy | string | Column to sort by |
Input Example
{"csvUrl": "https://example.com/data.csv","dedupColumns": ["email"],"trimWhitespace": true,"removeEmpty": true,"sortBy": "created_at"}
Output
| Field | Type | Description |
|---|---|---|
rowNumber | integer | Original row index |
data | object | Cleaned row as key-value pairs |
changes | string[] | List of cleanings applied to this row |
dropped | boolean | Whether the row was removed |
dropReason | string | null |
Output Example
{"inputRows": 1250,"outputRows": 1180,"duplicatesRemoved": 45,"emptyRowsRemoved": 25,"cleanedData": [{"email": "user1@example.com", "name": "Alice", "created_at": "2026-01-01"},{"email": "user2@example.com", "name": "Bob", "created_at": "2026-01-02"}]}
API Usage
Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.
cURL
curl -X POST "https://api.apify.com/v2/acts/taroyamada~csv-data-cleaner/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{ "csvUrl": "https://example.com/data.csv", "dedupColumns": ["email"], "trimWhitespace": true, "removeEmpty": true, "sortBy": "created_at" }'
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("taroyamada/csv-data-cleaner").call(run_input={"csvUrl": "https://example.com/data.csv","dedupColumns": ["email"],"trimWhitespace": true,"removeEmpty": true,"sortBy": "created_at"})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
JavaScript / Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('taroyamada/csv-data-cleaner').call({"csvUrl": "https://example.com/data.csv","dedupColumns": ["email"],"trimWhitespace": true,"removeEmpty": true,"sortBy": "created_at"});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Tips & Limitations
- Set
removeDuplicates: trueto deduplicate based on all columns. - Use
delimiterto handle TSV (\t) or semicolon-separated files. - Combine with Phone Validator and Email Checker for full lead-data cleansing.
- Output dataset is ready for direct import into CRMs or databases.
FAQ
What CSV dialects are supported?
Standard RFC 4180 CSV: comma-delimited, quoted fields, CRLF line endings. TSV not supported directly.
Max CSV file size?
In-memory processing. Works well up to ~100 MB / 1M rows. Larger files need chunking.
Does it validate data types?
No — cleaning operations only. For type validation, combine with validation libraries.
Can I use this in Apify pipelines?
Yes — provide datasetId from a prior actor run to clean that dataset directly.
What's the max file size?
Limited by actor memory (1024 MB by default). Tested up to 100k rows.
Can I upload a local CSV?
Provide a public URL via csvUrl. Use a service like file.io or S3 presigned URLs for private files.
Related Actors
DevOps & Tech Intel cluster — explore related Apify tools:
- 🌐 DNS Propagation Checker — Check DNS propagation across 8 global resolvers (Google, Cloudflare, Quad9, OpenDNS).
- 🔍 Subdomain Finder — Discover subdomains for any domain using Certificate Transparency logs (crt.
- 📦 NPM Package Analyzer — Analyze npm packages: download stats, dependencies, licenses, deprecation status.
- 💬 Reddit Scraper — Scrape Reddit posts and comments from any subreddit via official JSON API.
- GitHub Release & Changelog Monitor API — Track GitHub releases, tags, release notes, and changelog drift over time with one summary-first repository row per repo.
- Docs & Changelog Drift Monitor API — Monitor release notes, changelog pages, migration guides, and key docs pages with one summary-first target row per monitored repo, SDK, or product.
- Tech Events Calendar API | Conferences + CFP — Aggregate tech conferences and CFPs across multiple sources into a deduplicated event calendar for DevRel and recruiting workflows.
- 🔒 OSS Vulnerability Monitor — Monitor open-source packages for known security vulnerabilities using OSV and GitHub Security Advisories.
Cost
Pay Per Event:
actor-start: $0.01 (flat fee per run)dataset-item: $0.001 per output item
Example: 1,000 items = $0.01 + (1,000 × $0.001) = $1.01
No subscription required — you only pay for what you use.