📊 CSV Lead List Cleaner avatar

📊 CSV Lead List Cleaner

Under maintenance

Pricing

Pay per event

Go to Apify Store
📊 CSV Lead List Cleaner

📊 CSV Lead List Cleaner

Under maintenance

Deduplicate and clean CSV export files before importing to your CRM. Remove empty rows, trim whitespace, and sort extracted contact details.

Pricing

Pay per event

Rating

0.0

(0)

Developer

太郎 山田

太郎 山田

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

12 hours ago

Last modified

Share

🧹 CSV Data Cleaner

Clean CSV data: trim whitespace, remove empty rows, deduplicate by columns, sort. Pure JavaScript, zero external dependencies, zero API keys.

Store Quickstart

Start with the Quickstart template (direct CSV URL). For Apify pipelines, use Pipeline Cleaner with datasetId.

Key Features

  • 🧹 Trim whitespace — Remove leading/trailing spaces from all cells
  • 🗑️ Remove empty rows — Drop rows where all columns are empty
  • 🔁 Deduplicate by columns — Remove duplicate rows by specified key columns
  • 📊 Sort by column — Output sorted by any column
  • 🔗 Dataset or URL input — Apify dataset ID or direct CSV URL
  • 🔑 No API key needed — Pure JS, zero dependencies

Use Cases

WhoWhy
Data engineersClean scraper outputs before downstream processing
BI analystsStandardize CSV imports from multiple sources
Marketing opsClean lead list CSVs before CRM upload
Data migrationNormalize CSV files during system migrations
Apify pipelinesPost-process actor output datasets

Input

FieldTypeDefaultDescription
csvUrlstringDirect CSV URL (or use datasetId)
datasetIdstringApify dataset ID (or use csvUrl)
dedupColumnsstring[][]Columns for dedup key
trimWhitespacebooleantrueTrim whitespace
removeEmptybooleantrueRemove empty rows
sortBystringColumn to sort by

Input Example

{
"csvUrl": "https://example.com/data.csv",
"dedupColumns": ["email"],
"trimWhitespace": true,
"removeEmpty": true,
"sortBy": "created_at"
}

Output

FieldTypeDescription
rowNumberintegerOriginal row index
dataobjectCleaned row as key-value pairs
changesstring[]List of cleanings applied to this row
droppedbooleanWhether the row was removed
dropReasonstringnull

Output Example

{
"inputRows": 1250,
"outputRows": 1180,
"duplicatesRemoved": 45,
"emptyRowsRemoved": 25,
"cleanedData": [
{"email": "user1@example.com", "name": "Alice", "created_at": "2026-01-01"},
{"email": "user2@example.com", "name": "Bob", "created_at": "2026-01-02"}
]
}

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~csv-data-cleaner/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "csvUrl": "https://example.com/data.csv", "dedupColumns": ["email"], "trimWhitespace": true, "removeEmpty": true, "sortBy": "created_at" }'

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/csv-data-cleaner").call(run_input={
"csvUrl": "https://example.com/data.csv",
"dedupColumns": ["email"],
"trimWhitespace": true,
"removeEmpty": true,
"sortBy": "created_at"
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/csv-data-cleaner').call({
"csvUrl": "https://example.com/data.csv",
"dedupColumns": ["email"],
"trimWhitespace": true,
"removeEmpty": true,
"sortBy": "created_at"
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

  • Set removeDuplicates: true to deduplicate based on all columns.
  • Use delimiter to handle TSV (\t) or semicolon-separated files.
  • Combine with Phone Validator and Email Checker for full lead-data cleansing.
  • Output dataset is ready for direct import into CRMs or databases.

FAQ

What CSV dialects are supported?

Standard RFC 4180 CSV: comma-delimited, quoted fields, CRLF line endings. TSV not supported directly.

Max CSV file size?

In-memory processing. Works well up to ~100 MB / 1M rows. Larger files need chunking.

Does it validate data types?

No — cleaning operations only. For type validation, combine with validation libraries.

Can I use this in Apify pipelines?

Yes — provide datasetId from a prior actor run to clean that dataset directly.

What's the max file size?

Limited by actor memory (1024 MB by default). Tested up to 100k rows.

Can I upload a local CSV?

Provide a public URL via csvUrl. Use a service like file.io or S3 presigned URLs for private files.

DevOps & Tech Intel cluster — explore related Apify tools:

Cost

Pay Per Event:

  • actor-start: $0.01 (flat fee per run)
  • dataset-item: $0.001 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.001) = $1.01

No subscription required — you only pay for what you use.