Under maintenance

Pricing

Pay per event

Try for free

Go to Apify Store

📊 CSV Lead List Cleaner

Under maintenance

Try for free

Deduplicate and clean CSV export files before importing to your CRM. Remove empty rows, trim whitespace, and sort extracted contact details.

Pricing

Pay per event

Rating

0.0

(0)

Developer

太郎山田

Actor stats

Bookmarked

Total users

Monthly active users

12 hours ago

Last modified

🧹 CSV Data Cleaner

Clean CSV data: trim whitespace, remove empty rows, deduplicate by columns, sort. Pure JavaScript, zero external dependencies, zero API keys.

Store Quickstart

Start with the Quickstart template (direct CSV URL). For Apify pipelines, use Pipeline Cleaner with datasetId.

Key Features

🧹 Trim whitespace — Remove leading/trailing spaces from all cells
🗑️ Remove empty rows — Drop rows where all columns are empty
🔁 Deduplicate by columns — Remove duplicate rows by specified key columns
📊 Sort by column — Output sorted by any column
🔗 Dataset or URL input — Apify dataset ID or direct CSV URL
🔑 No API key needed — Pure JS, zero dependencies

Use Cases

Who	Why
Data engineers	Clean scraper outputs before downstream processing
BI analysts	Standardize CSV imports from multiple sources
Marketing ops	Clean lead list CSVs before CRM upload
Data migration	Normalize CSV files during system migrations
Apify pipelines	Post-process actor output datasets

Input

Field	Type	Default	Description
csvUrl	string		Direct CSV URL (or use datasetId)
datasetId	string		Apify dataset ID (or use csvUrl)
dedupColumns	string[]	[]	Columns for dedup key
trimWhitespace	boolean	true	Trim whitespace
removeEmpty	boolean	true	Remove empty rows
sortBy	string		Column to sort by

Input Example

{
  "csvUrl": "https://example.com/data.csv",
  "dedupColumns": ["email"],
  "trimWhitespace": true,
  "removeEmpty": true,
  "sortBy": "created_at"
}

Output

Field	Type	Description
`rowNumber`	integer	Original row index
`data`	object	Cleaned row as key-value pairs
`changes`	string[]	List of cleanings applied to this row
`dropped`	boolean	Whether the row was removed
`dropReason`	string	null

Output Example

{
  "inputRows": 1250,
  "outputRows": 1180,
  "duplicatesRemoved": 45,
  "emptyRowsRemoved": 25,
  "cleanedData": [
    {"email": "user1@example.com", "name": "Alice", "created_at": "2026-01-01"},
    {"email": "user2@example.com", "name": "Bob", "created_at": "2026-01-02"}
  ]
}

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~csv-data-cleaner/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "csvUrl": "https://example.com/data.csv", "dedupColumns": ["email"], "trimWhitespace": true, "removeEmpty": true, "sortBy": "created_at" }'

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/csv-data-cleaner").call(run_input={
  "csvUrl": "https://example.com/data.csv",
  "dedupColumns": ["email"],
  "trimWhitespace": true,
  "removeEmpty": true,
  "sortBy": "created_at"
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/csv-data-cleaner').call({
  "csvUrl": "https://example.com/data.csv",
  "dedupColumns": ["email"],
  "trimWhitespace": true,
  "removeEmpty": true,
  "sortBy": "created_at"
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

Set removeDuplicates: true to deduplicate based on all columns.
Use delimiter to handle TSV (\t) or semicolon-separated files.
Combine with Phone Validator and Email Checker for full lead-data cleansing.
Output dataset is ready for direct import into CRMs or databases.

FAQ

What CSV dialects are supported?

Standard RFC 4180 CSV: comma-delimited, quoted fields, CRLF line endings. TSV not supported directly.

Max CSV file size?

In-memory processing. Works well up to ~100 MB / 1M rows. Larger files need chunking.

Does it validate data types?

No — cleaning operations only. For type validation, combine with validation libraries.

Can I use this in Apify pipelines?

Yes — provide datasetId from a prior actor run to clean that dataset directly.

What's the max file size?

Limited by actor memory (1024 MB by default). Tested up to 100k rows.

Can I upload a local CSV?

Provide a public URL via csvUrl. Use a service like file.io or S3 presigned URLs for private files.

DevOps & Tech Intel cluster — explore related Apify tools:

🌐 DNS Propagation Checker — Check DNS propagation across 8 global resolvers (Google, Cloudflare, Quad9, OpenDNS).
🔍 Subdomain Finder — Discover subdomains for any domain using Certificate Transparency logs (crt.
📦 NPM Package Analyzer — Analyze npm packages: download stats, dependencies, licenses, deprecation status.
💬 Reddit Scraper — Scrape Reddit posts and comments from any subreddit via official JSON API.
GitHub Release & Changelog Monitor API — Track GitHub releases, tags, release notes, and changelog drift over time with one summary-first repository row per repo.
Docs & Changelog Drift Monitor API — Monitor release notes, changelog pages, migration guides, and key docs pages with one summary-first target row per monitored repo, SDK, or product.
Tech Events Calendar API | Conferences + CFP — Aggregate tech conferences and CFPs across multiple sources into a deduplicated event calendar for DevRel and recruiting workflows.
🔒 OSS Vulnerability Monitor — Monitor open-source packages for known security vulnerabilities using OSV and GitHub Security Advisories.

Cost

Pay Per Event:

actor-start: $0.01 (flat fee per run)
dataset-item: $0.001 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.001) = $1.01

No subscription required — you only pay for what you use.

Scraped Data Cleaner & Converter (No-Code CSV/JSON Tool) Rental

m3web/scraped-data-cleaner-rental

Clean and organize scraped .json or .csv data — no coding required. Remove duplicates, empty rows, unwanted columns, and sort by any field. Cleaned results are pushed to your Apify dataset. Perfect for marketers, researchers, and no-code workflows.

M3Web

5.0

Scraped Data Cleaner & Converter (No-Code CSV/JSON Tool) - PPE

m3web/scraped-data-cleaner-ppe

Clean and organize scraped .json or .csv data — no coding required. Remove duplicates, empty rows, unwanted columns, and sort by any field. Cleaned results are stored in Apify's Key-Value Store. Perfect for marketers, researchers, and no-code workflows.

M3Web

Fast Dataset Cleaner & CSV Formatter

motivational_nickel/dataset-cleaner-and-formatter

Fast dataset cleaning for CSV and JSON files. Automatically removes duplicates, trims whitespace, fixes capitalization, and normalizes fields. Works with Apify datasets or uploaded files and prepares data for analytics, CRM imports, and automation pipelines.

Leoncio Jr Coronado

Dataset Deduplicator

automation-lab/dataset-dedup

Merge and deduplicate Apify datasets by any field combination. Remove duplicates, keep first or last occurrence. Case-insensitive matching, whitespace trimming. Pay per 1K items processed.

Stas Persiianenko

ai-data-cleaner-classifier

keratogenous_surgeon/dataset-ai-cleaner

Clean, normalize, deduplicate, and classify JSON, CSV, or Apify datasets using rules or OpenAI models. Built for automation pipelines, data preparation, and AI workflows. Supports dataset chaining, cost controls, and safe fallbacks.

King Shepherd

Sort Dataset Items

lukaskrivka/sort-dataset-items

Add this actor as a webhook to your scraper to sort the dataset by index field

Lukáš Křivka

CSV Diff Tool

automation-lab/csv-diff-tool

Compare two CSV datasets and find added, removed, and modified rows. Supports key-column matching, configurable delimiters, case-insensitive comparison, and whitespace trimming. Exports a structured change report with before/after values.

Stas Persiianenko

100% PURE Scraper

mshopik/100percent-pure-scraper

Scrape 100% PURE and extract data on make-up and cosmetics from 100percentpure.com. Our 100% PURE API lets you crawl product information and pricing. The saved data can be downloaded as HTML, JSON, CSV, Excel, and XML.

Mark Carter

Google Maps Scraper

theguide/google-maps-scraper

The most complete and cost-effective Google Maps Scraper on the market. Extract businesses, emails, social media profiles, and reviews with zero external API costs.