Lunar Data Cleaner avatar

Lunar Data Cleaner

Pricing

from $0.03 / 1,000 rows cleaneds

Go to Apify Store
Lunar Data Cleaner

Lunar Data Cleaner

Clean CSV, Excel, or JSON files with audit trails, PII masking, and budget control. Remove duplicates, fix missing values, standardize dates/numbers, and get quality reports.

Pricing

from $0.03 / 1,000 rows cleaneds

Rating

0.0

(0)

Developer

Yu Liu

Yu Liu

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

General Data Cleaner

Automatically clean CSV, Excel, and JSON files with full audit trails, PII masking, and budget control. Remove duplicates, fix missing values, standardize dates and numbers, and get quality reports — all in one step.

Why use this Actor?

  • Save time – Stop manually fixing spreadsheets. Let the cleaner handle missing values, duplicates, outliers, and format inconsistencies.
  • Audit ready – Every change is logged. You get task-level and rule-level CSV reports, plus an HTML quality report with ISO 8000 scores.
  • Privacy safe – Enable PII masking to automatically redact SSNs, credit card numbers, and emails (irreversible).
  • Cost control – Set your budget limit; the Actor stops when reached (hard cap $10.00).
  • Preview mode – Test with first N rows before cleaning your full dataset.

Input Parameters (JSON)

Provide input as a JSON object. Example:

{
"sourceData": "https://example.com/my-data.csv",
"delimiter": "auto",
"encoding": "auto",
"auditLevel": "rule",
"maxChargeUsd": 5.0,
"previewMode": false,
"previewRows": 100,
"enablePiiMasking": false,
"piiMaskingRules": "",
"cellErrorPolicy": "skip_cell",
"outputFormat": "csv",
"locale": "US"
}

Parameter reference

ParameterTypeRequiredDefaultDescription
sourceDatastringYesURL, local path, or Apify Dataset ID (e.g., dataset-username/dataset-name) of the input file. Supports CSV, Excel (.xlsx, .xls), JSON.
delimiterstringNoautoColumn delimiter: auto (detect), ",", ";", "\t", "
encodingstringNoautoFile encoding: auto (detect), "utf-8", "iso-8859-1", etc.
auditLevelstringNoruleAudit granularity: "task" (only run summary) or "rule" (per cleaning rule).
maxChargeUsdnumberNo5.0Budget limit (USD). Actual stop = min(this, system hard cap $10.0).
previewModebooleanNofalseIf true, only process first previewRows.
previewRowsintegerNo100Number of rows to process when previewMode=true (max 10,000).
enablePiiMaskingbooleanNofalseIf true, mask SSN, credit cards, emails (irreversible).
piiMaskingRulesstringNo""Comma-separated column names or regex patterns for additional masking (e.g., "phone,custom_id").
cellErrorPolicystringNoskip_cellHow to handle cell conversion errors: "skip_cell" (keep original, continue) or "stop_rule" (fail the rule).
outputFormatstringNocsvOutput format: "csv", "excel", or "json".
localestringNoUSDate and number format: "US" (MM/DD/YYYY, 1,234.56) or "EU" (DD/MM/YYYY, 1.234,56).

Output Files

After a successful run, you will find the following files:

FileLocationDescription
Cleaned dataApify DatasetThe cleaned dataset in your chosen format (CSV/Excel/JSON).
Task auditKey-Value Store → audit_task_{{run.id}}.csvOne row per run: session_id, timestamps, exit reason, budget used, preview mode flag.
Rule auditKey-Value Store → audit_rules_{{run.id}}.csvOne row per cleaning rule: rule_id, affected rows, execution time, status (only when auditLevel=rule).
Skipped rowsKey-Value Store → audit_skipped_{{run.id}}.csvRows that could not be parsed (e.g., encoding errors, column mismatches).
Quality report (HTML)Key-Value Store → quality_report_{{run.id}}.htmlHuman‑readable report with ISO 8000 dimension scores (completeness, accuracy, consistency, format).
Quality report (JSON)Key-Value Store → quality_report_{{run.id}}.jsonSame data in JSON format.
Error reportKey-Value Store → errors_{{run.id}}.jsonDetailed error information (if any).
Debug logKey-Value Store → debug_log_{{run.id}}.txtLast 1000 log lines (saved only if an error occurs).

Tip: The first record in the Dataset is an OUTPUT_SUMMARY that lists all the above keys. You can also access the Key-Value Store directly via the Apify Console.

Usage Examples

1. Basic cleaning (run with defaults)

If you set a default sourceData (e.g., our example CSV), simply click Run with defaults. The Actor will clean the example file and output the results.

2. Clean your own file from a URL

Set sourceData to the URL of your CSV/Excel/JSON file.

3. Preview mode (test before full run)

Set previewMode to true and previewRows to e.g. 100.

4. Enable PII masking

Set enablePiiMasking to true and optionally piiMaskingRules.

5. European locale (EU)

Set locale to "EU" and delimiter to ";" if needed.

How to Use (Apify Console)

  1. Go to the Actor page.
  2. In the Input tab, switch to JSON mode (or use the form).
  3. Paste your JSON configuration (see examples above).
  4. Click Start.
  5. Download the cleaned dataset from the Dataset tab, and audit/quality reports from the Key-Value Store.

Bugs, fixes, updates, and changelog

This product is under active development. If you encounter any issues, have feature requests, or would like to provide feedback, please open an issue on our GitHub repository:

👉 here

Support

Email: liuyu.digitaltwin@outlook.com Please include your session_id (found in the Actor run log or task audit CSV) when reporting issues.