CSV Diff Tool
Pricing
Pay per event
CSV Diff Tool
Compare two CSV datasets and find added, removed, and modified rows. Supports key-column matching, configurable delimiters, case-insensitive comparison, and whitespace trimming. Exports a structured change report with before/after values.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
2 days ago
Last modified
Categories
Share
CSV Diff Tool — Compare Two CSV Files & Find Changes
🔍 Paste two CSV datasets and instantly see what changed — which rows were added, removed, or modified. Supports key-column matching, configurable delimiters, case-insensitive comparison, and exports a detailed change report in JSON, CSV, or Excel.
What does CSV Diff Tool do?
CSV Diff Tool compares two CSV datasets row by row and outputs the exact differences: added rows, removed rows, and modified rows with before/after values for each changed column.
Unlike simple text-diff tools, this actor understands CSV structure — it matches rows by their key column values (e.g. id, email, or a composite [country, city]), not by line position. This means rows that were reordered between versions are correctly identified as unchanged, not as "removed then added."
The actor handles all real-world CSV quirks: quoted fields, commas inside values, semicolon-delimited European CSVs, TSV files, headers or no headers, leading/trailing whitespace, and case-insensitive comparison.
Who is CSV Diff Tool for?
📊 Data engineers and analysts who receive periodic CSV exports from databases, ERP systems, or third-party APIs and need to audit what changed between snapshots. Instead of manually scrolling through two spreadsheets, paste both and get a structured change report in seconds.
🛒 E-commerce operators comparing product catalog exports — tracking price changes, stock level drops, or product removals between weekly supplier feeds. The actor flags exactly which SKUs changed and shows old vs. new values side by side.
📋 Compliance and audit teams verifying that data migrations, ETL transforms, or database imports produced the expected changes. Use the summary stats (added/removed/modified counts) as an audit checkpoint, and drill into modified rows to verify specific field changes.
🔧 Developers and QA engineers running regression tests on data pipelines or API exports. Compare baseline and updated snapshots automatically in CI workflows via the Apify API.
📈 Operations and business intelligence teams who track KPIs exported as CSVs (sales reports, lead lists, inventory) and need a fast way to see week-over-week deltas without writing custom scripts.
Why use CSV Diff Tool?
- ✅ Intelligent row matching — matches rows by key column values, not line position. Rows that moved keep their identity.
- ✅ Before/after column diffs — modified rows show
price_before: 14.99, price_after: 19.99so you see exactly what changed. - ✅ No code required — paste CSV, click Run. No Python, no SQL, no scripting.
- ✅ Any delimiter — comma, semicolon, tab, pipe. Works with European CSVs out of the box.
- ✅ Handles quoted fields — commas inside quoted values are correctly parsed, not split.
- ✅ Composite key support — match on multiple columns (
["country", "city"]) for datasets without a single unique identifier. - ✅ Case-insensitive mode — treat
APPLEandappleas the same value. - ✅ Whitespace trimming — ignore invisible whitespace differences.
- ✅ Export anywhere — results in JSON, CSV, Excel, or NDJSON. Schedule via API for automated monitoring.
- ✅ Pure computation — no proxy, no browser. Runs in under a second for typical datasets.
What data can you extract?
The actor outputs one row per change to the Apify dataset, plus a summary in the key-value store.
Per-change row output:
| Field | Description |
|---|---|
changeType | added, removed, or modified |
rowKey | Human-readable key, e.g. id=42 or country=UK|city=London |
changedColumns | Comma-separated list of columns that changed (for modified rows) |
columnCount | Number of columns that changed |
data | Full row data — modified rows show field_before and field_after for each changed column |
Summary object (saved to key-value store as DIFF_SUMMARY):
| Field | Description |
|---|---|
totalRowsA | Total rows in CSV A (baseline) |
totalRowsB | Total rows in CSV B (updated) |
addedRows | Rows present in B but not A |
removedRows | Rows present in A but not B |
modifiedRows | Rows present in both but with different values |
unchangedRows | Rows identical in both datasets |
totalChanges | Sum of added + removed + modified |
keyColumns | Key columns used for matching |
How much does it cost to compare CSV files?
This actor uses pay-per-event (PPE) pricing — you only pay for what you use.
| Event | Price |
|---|---|
| Actor start (covers first 1,000 rows compared) | $0.005 |
| Additional 1,000 rows compared | $0.001 |
Real-world cost examples:
| Dataset size | Cost |
|---|---|
| Two 50-row product catalogs | $0.005 (start fee only) |
| Two 500-row customer lists | $0.005 (start fee only) |
| Two 2,000-row inventory files | ~$0.007 (start + 3 extra units) |
| Two 10,000-row transaction logs | ~$0.024 |
| Two 50,000-row data exports | ~$0.104 |
💡 Free plan estimate: Apify gives new users $5 in free credits. That covers ~500 diff runs on typical datasets, or a single comparison of two 500,000-row files.
All pricing uses tiered discounts — heavy users automatically pay less per row.
How to compare two CSV files
- Open CSV Diff Tool on Apify Store.
- Click Try for free to open the actor in Apify Console.
- Paste your baseline CSV (the "before" snapshot) into the CSV A field.
- Paste your updated CSV (the "after" snapshot) into the CSV B field.
- Set Key columns to the column name(s) that uniquely identify each row (e.g.
["id"]or["email"]). Leave empty to match by row position. - Configure Delimiter if you're using a non-comma separator (semicolon, tab, pipe).
- Click Start. The actor typically completes in under 2 seconds.
- View results in the Dataset tab — each change is a separate row with
changeType,rowKey, and fulldata. - Export as CSV or Excel using the Export button.
Example input (JSON format):
{"csvA": "id,name,price,stock\n1,Widget Alpha,9.99,100\n2,Widget Beta,14.99,50\n3,Widget Gamma,4.99,200","csvB": "id,name,price,stock\n1,Widget Alpha,9.99,100\n2,Widget Beta,19.99,45\n5,Widget Epsilon,34.99,10","keyColumns": ["id"],"delimiter": ",","hasHeader": true,"outputFormat": "all-changes"}
Example for TSV files:
{"csvA": "id\tname\tvalue\n1\tAlpha\t10\n2\tBeta\t20","csvB": "id\tname\tvalue\n1\tAlpha\t10\n2\tBeta\t25","keyColumns": ["id"],"delimiter": "\t","hasHeader": true}
Example with composite key:
{"csvA": "country,city,population\nUK,London,8900000\nDE,Berlin,3700000","csvB": "country,city,population\nUK,London,9100000\nFR,Paris,2100000","keyColumns": ["country", "city"],"hasHeader": true}
Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
csvA | String | — | Required. The baseline (original) CSV content. |
csvB | String | — | Required. The updated (new) CSV content. |
keyColumns | Array | [] | Column names that uniquely identify rows. Empty = use row position. |
delimiter | String | , | Field separator: , ; \t or | |
hasHeader | Boolean | true | Whether the first row contains column names. |
caseSensitive | Boolean | true | Whether value comparison is case-sensitive. |
trimWhitespace | Boolean | true | Trim leading/trailing spaces from cell values. |
outputFormat | String | all-changes | What to output: all-changes, added-only, removed-only, modified-only, summary-only |
Output examples
Modified row (price changed):
{"changeType": "modified","rowKey": "id=2","changedColumns": "price, stock","columnCount": 2,"data": {"id": "2","name": "Widget Beta","price_before": "14.99","price_after": "19.99","stock_before": "50","stock_after": "45"}}
Added row:
{"changeType": "added","rowKey": "id=5","changedColumns": "id, name, price, stock","columnCount": 4,"data": {"id": "5","name": "Widget Epsilon","price": "34.99","stock": "10"}}
Removed row:
{"changeType": "removed","rowKey": "id=4","changedColumns": "id, name, price, stock","columnCount": 4,"data": {"id": "4","name": "Widget Delta","price": "24.99","stock": "25"}}
Summary (from key-value store, key DIFF_SUMMARY):
{"totalRowsA": 4,"totalRowsB": 4,"addedRows": 1,"removedRows": 1,"modifiedRows": 1,"unchangedRows": 2,"totalChanges": 3,"keyColumns": ["id"],"hasHeader": true,"delimiter": ","}
Tips for best results
- 🔑 Always specify key columns for datasets that have a unique identifier. Without key columns, the actor uses row position — which means adding a row at the top makes every subsequent row appear as "modified."
- 📋 Use composite keys when no single column is unique. E.g.
["country", "city"]for geographic data or["year", "month", "product_id"]for time-series exports. - ✂️ Keep trimWhitespace enabled (the default). Spreadsheet exports often add invisible trailing spaces that would otherwise cause false "modified" detections.
- 🔠 Use case-insensitive mode when comparing data from different systems that may use different capitalization conventions.
- 📤 Use
outputFormat: modified-onlywhen you only care about value changes, not structural additions/removals. This keeps datasets small and focused. - 📊 Check
DIFF_SUMMARYin the key-value store for aggregate counts before diving into individual rows. IftotalChangesis 0, both CSVs are identical. - 🔄 Schedule weekly runs to automate monitoring of recurring CSV exports. Combine with webhooks to alert you when changes exceed a threshold.
- 📦 Large files: CSVs up to hundreds of thousands of rows work fine. The actor uses batched processing to handle large datasets efficiently.
Integrations
📊 CSV Diff Tool → Google Sheets Export diff results directly to a Google Sheets spreadsheet using Apify's Google Sheets integration. Create a live "change log" sheet that updates every time you re-run the diff. Useful for sharing change reports with non-technical stakeholders without manual copy-paste.
🔔 CSV Diff Tool → Slack/Discord alerts
Combine with an Apify webhook that fires when the run completes. If totalChanges > 0, send a Slack message with the counts: "Product catalog updated: 3 prices changed, 1 item removed." Keeps your team informed without manual monitoring.
🔁 CSV Diff Tool → Make (formerly Integromat) or Zapier
Trigger a Make scenario when the Apify run completes. Read the dataset items and route each changeType to a different action — e.g. added rows go to a CRM import, removed rows trigger a deactivation workflow, modified rows trigger a price update.
📅 Scheduled monitoring of recurring exports Use Apify's scheduler to run the diff daily or weekly against the latest export from your ERP, PIM, or data warehouse. Compare against the previous run's output (stored in a dataset) to build an automated audit trail.
🔗 Webhook-triggered diff in CI/CD pipelines Call the Apify API from your CI/CD pipeline after running a data migration. Diff the pre-migration and post-migration exports to verify the transform produced the expected changes. Fail the pipeline if unexpected rows were modified.
Using the Apify API
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });const run = await client.actor('automation-lab/csv-diff-tool').call({csvA: 'id,name,price\n1,Alpha,9.99\n2,Beta,14.99',csvB: 'id,name,price\n1,Alpha,9.99\n2,Beta,19.99',keyColumns: ['id'],outputFormat: 'all-changes',});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Python
from apify_client import ApifyClientclient = ApifyClient(token='YOUR_APIFY_TOKEN')run = client.actor('automation-lab/csv-diff-tool').call(run_input={'csvA': 'id,name,price\n1,Alpha,9.99\n2,Beta,14.99','csvB': 'id,name,price\n1,Alpha,9.99\n2,Beta,19.99','keyColumns': ['id'],'outputFormat': 'all-changes',})items = client.dataset(run['defaultDatasetId']).list_items().itemsprint(items)
cURL
curl -X POST \"https://api.apify.com/v2/acts/automation-lab~csv-diff-tool/runs?token=YOUR_APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"csvA": "id,name,price\n1,Alpha,9.99\n2,Beta,14.99","csvB": "id,name,price\n1,Alpha,9.99\n2,Beta,19.99","keyColumns": ["id"],"outputFormat": "all-changes"}'
Use with AI agents via MCP
CSV Diff Tool is available as a tool for AI assistants that support the Model Context Protocol (MCP).
Add the Apify MCP server to your AI client — this gives you access to all Apify actors, including this one:
Setup for Claude Code
$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/csv-diff-tool"
Setup for Claude Desktop, Cursor, or VS Code
Add this to your MCP config file:
{"mcpServers": {"apify": {"type": "http","url": "https://mcp.apify.com?tools=automation-lab/csv-diff-tool","headers": {"Authorization": "Bearer YOUR_APIFY_TOKEN"}}}}
Example prompts for AI agents:
- "Compare these two CSV exports and tell me which product prices changed."
- "I have a customer list from January and one from February — find who was added or removed."
- "Diff these two CSVs using 'email' as the key column and show me only the modified rows."
Is it legal to compare CSV data?
Yes — this actor performs local computation only on data you provide. It makes no external HTTP requests, accesses no third-party websites, and stores nothing beyond the run's dataset and key-value store (which you control).
Your data, your responsibility: Ensure you have the right to process any personal data contained in the CSVs you compare (GDPR, CCPA, etc.). The actor does not transmit your data to any third parties.
FAQ
How fast is the comparison? For typical datasets (up to 10,000 rows), the comparison completes in under 2 seconds. Even 100,000-row CSVs typically finish in under 10 seconds since the algorithm runs in O(n) time with hash-based key lookups.
How much does it cost? The start fee is $0.005, which covers the first 1,000 rows compared. Additional rows cost $0.001 per 1,000. Two 500-row CSVs cost $0.005 total. See the Pricing section for a full table.
What if my CSV has duplicate key values?
The last row with a given key value wins in the lookup map. If your data has genuine duplicates (e.g., multiple orders with the same order_id), use a composite key that makes rows unique, or leave keyColumns empty to use row-position matching.
How does it differ from a plain text diff (e.g. diff command)?
A text diff treats each line as a unit and is sensitive to row order — adding one row at the top makes every subsequent row appear as "changed." CSV Diff Tool understands column structure and uses key-based matching, so reordered rows are correctly identified as unchanged, and you see column-level before/after values for modified rows.
Why are some rows showing as modified when they look identical?
The most common cause is invisible whitespace — extra spaces before or after cell values. Enable trimWhitespace: true (it's on by default). The second most common cause is case differences — Apple vs apple. Enable caseSensitive: false to ignore case.
Why are all rows showing as modified when I expect 0 changes?
Check that both CSVs use the same delimiter. If one is comma-separated and the other is semicolon-separated, the parser will misread column boundaries and everything will look different. Also verify that hasHeader matches — if one CSV has a header and you set hasHeader: false, the header row becomes a data row and shifts all comparisons.
Can I compare CSVs with different columns? Yes. The actor takes the union of all column names from both CSVs. Columns present in one but not the other will have empty values for the dataset they're missing from.
Other data tools from automation-lab
🔗 Apache Log Parser — Parse Apache and Nginx access logs into structured datasets with IP, path, status code, and response time fields.
🔗 Base64 Converter — Encode or decode Base64 strings and file content, with URL-safe variant support.
🔗 Barcode Generator — Generate QR codes and 1D barcodes (EAN-13, Code128, etc.) in bulk from a list of values.
🔗 Accessibility Checker — Audit web pages for WCAG accessibility violations using automated axe-core analysis.
🔗 Broken Link Checker — Crawl a website and find all broken internal and external links with their HTTP status codes.
🔗 Fake Test Data Generator — Generate bulk fake/test data with realistic names, addresses, emails, and more using Faker.js.
🔗 Bulk Image Optimizer — Compress and resize images in bulk from URLs, with WebP conversion and quality controls.