Pricing

Pay per event

CSV Diff Tool

Compare two CSV datasets and find added, removed, and modified rows. Supports key-column matching, configurable delimiters, case-insensitive comparison, and whitespace trimming. Exports a structured change report with before/after values.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

11 days ago

Last modified

CSV Diff Tool — Compare Two CSV Files & Find Changes

🔍 Paste two CSV datasets and instantly see what changed — which rows were added, removed, or modified. Supports key-column matching, configurable delimiters, case-insensitive comparison, and exports a detailed change report in JSON, CSV, or Excel.

What does CSV Diff Tool do?

CSV Diff Tool compares two CSV datasets row by row and outputs the exact differences: added rows, removed rows, and modified rows with before/after values for each changed column.

Unlike simple text-diff tools, this actor understands CSV structure — it matches rows by their key column values (e.g. id, email, or a composite [country, city]), not by line position. This means rows that were reordered between versions are correctly identified as unchanged, not as "removed then added."

The actor handles all real-world CSV quirks: quoted fields, commas inside values, semicolon-delimited European CSVs, TSV files, headers or no headers, leading/trailing whitespace, and case-insensitive comparison.

Who is it for?

📊 Data engineers and analysts who receive periodic CSV exports from databases, ERP systems, or third-party APIs and need to audit what changed between snapshots. Instead of manually scrolling through two spreadsheets, paste both and get a structured change report in seconds.

🛒 E-commerce operators comparing product catalog exports — tracking price changes, stock level drops, or product removals between weekly supplier feeds. The actor flags exactly which SKUs changed and shows old vs. new values side by side.

📋 Compliance and audit teams verifying that data migrations, ETL transforms, or database imports produced the expected changes. Use the summary stats (added/removed/modified counts) as an audit checkpoint, and drill into modified rows to verify specific field changes.

🔧 Developers and QA engineers running regression tests on data pipelines or API exports. Compare baseline and updated snapshots automatically in CI workflows via the Apify API.

📈 Operations and business intelligence teams who track KPIs exported as CSVs (sales reports, lead lists, inventory) and need a fast way to see week-over-week deltas without writing custom scripts.

Why use CSV Diff Tool?

✅ Intelligent row matching — matches rows by key column values, not line position. Rows that moved keep their identity.
✅ Before/after column diffs — modified rows show price_before: 14.99, price_after: 19.99 so you see exactly what changed.
✅ No code required — paste CSV, click Run. No Python, no SQL, no scripting.
✅ Any delimiter — comma, semicolon, tab, pipe. Works with European CSVs out of the box.
✅ Handles quoted fields — commas inside quoted values are correctly parsed, not split.
✅ Composite key support — match on multiple columns (["country", "city"]) for datasets without a single unique identifier.
✅ Case-insensitive mode — treat APPLE and apple as the same value.
✅ Whitespace trimming — ignore invisible whitespace differences.
✅ Export anywhere — results in JSON, CSV, Excel, or NDJSON. Schedule via API for automated monitoring.
✅ Pure computation — no proxy, no browser. Runs in under a second for typical datasets.

What data can you extract?

The actor outputs one row per change to the Apify dataset, plus a summary in the key-value store.

Per-change row output:

Field	Description
`changeType`	`added`, `removed`, or `modified`
`rowKey`	Human-readable key, e.g. `id=42` or `country=UK\|city=London`
`changedColumns`	Comma-separated list of columns that changed (for modified rows)
`columnCount`	Number of columns that changed
`data`	Full row data — modified rows show `field_before` and `field_after` for each changed column

Summary object (saved to key-value store as DIFF_SUMMARY):

Field	Description
`totalRowsA`	Total rows in CSV A (baseline)
`totalRowsB`	Total rows in CSV B (updated)
`addedRows`	Rows present in B but not A
`removedRows`	Rows present in A but not B
`modifiedRows`	Rows present in both but with different values
`unchangedRows`	Rows identical in both datasets
`totalChanges`	Sum of added + removed + modified
`keyColumns`	Key columns used for matching

How much does it cost to compare CSV files?

This actor uses pay-per-event (PPE) pricing — you only pay for what you use.

Event	Price
Actor start (covers first 1,000 rows compared)	$0.005
Additional 1,000 rows compared (first 100 units)	$0.001
Additional 1,000 rows compared (100–1,000 units)	$0.0008
Additional 1,000 rows compared (1,000+ units)	$0.0006

Real-world cost examples:

Dataset size	Cost
Two 50-row product catalogs	$0.005 (start fee only)
Two 500-row customer lists	$0.005 (start fee only)
Two 2,000-row inventory files	~$0.007 (start + 3 extra units)
Two 10,000-row transaction logs	~$0.024
Two 50,000-row data exports	~$0.099
Two 500,000-row data exports	~$0.665

💡 Free plan estimate: Apify gives new users $5 in free credits. That covers ~500 diff runs on typical datasets, or a single comparison of two 500,000-row files.

All pricing uses volume tiers — the per-unit price automatically decreases as you compare more rows in a single run.

How to compare two CSV files

Open CSV Diff Tool on Apify Store.
Click Try for free to open the actor in Apify Console.
Paste your baseline CSV (the "before" snapshot) into the CSV A field.
Paste your updated CSV (the "after" snapshot) into the CSV B field.
Set Key columns to the column name(s) that uniquely identify each row (e.g. ["id"] or ["email"]). Leave empty to match by row position.
Configure Delimiter if you're using a non-comma separator (semicolon, tab, pipe).
Click Start. The actor typically completes in under 2 seconds.
View results in the Dataset tab — each change is a separate row with changeType, rowKey, and full data.
Export as CSV or Excel using the Export button.

Example input (JSON format):

{
    "csvA": "id,name,price,stock\n1,Widget Alpha,9.99,100\n2,Widget Beta,14.99,50\n3,Widget Gamma,4.99,200",
    "csvB": "id,name,price,stock\n1,Widget Alpha,9.99,100\n2,Widget Beta,19.99,45\n5,Widget Epsilon,34.99,10",
    "keyColumns": ["id"],
    "delimiter": ",",
    "hasHeader": true,
    "outputFormat": "all-changes"
}

Example for TSV files:

{
    "csvA": "id\tname\tvalue\n1\tAlpha\t10\n2\tBeta\t20",
    "csvB": "id\tname\tvalue\n1\tAlpha\t10\n2\tBeta\t25",
    "keyColumns": ["id"],
    "delimiter": "\t",
    "hasHeader": true
}

Example with composite key:

{
    "csvA": "country,city,population\nUK,London,8900000\nDE,Berlin,3700000",
    "csvB": "country,city,population\nUK,London,9100000\nFR,Paris,2100000",
    "keyColumns": ["country", "city"],
    "hasHeader": true
}

Input parameters

Parameter	Type	Default	Description
`csvA`	String	—	Required. The baseline (original) CSV content.
`csvB`	String	—	Required. The updated (new) CSV content.
`keyColumns`	Array	`[]`	Column names that uniquely identify rows. Empty = use row position.
`delimiter`	String	`,`	Field separator: `,` `;` `\t` or `\|`
`hasHeader`	Boolean	`true`	Whether the first row contains column names.
`caseSensitive`	Boolean	`true`	Whether value comparison is case-sensitive.
`trimWhitespace`	Boolean	`true`	Trim leading/trailing spaces from cell values.
`outputFormat`	String	`all-changes`	What to output: `all-changes`, `added-only`, `removed-only`, `modified-only`, `summary-only`

Output examples

Modified row (price changed):

{
    "changeType": "modified",
    "rowKey": "id=2",
    "changedColumns": "price, stock",
    "columnCount": 2,
    "data": {
        "id": "2",
        "name": "Widget Beta",
        "price_before": "14.99",
        "price_after": "19.99",
        "stock_before": "50",
        "stock_after": "45"
    }
}

Added row:

{
    "changeType": "added",
    "rowKey": "id=5",
    "changedColumns": "id, name, price, stock",
    "columnCount": 4,
    "data": {
        "id": "5",
        "name": "Widget Epsilon",
        "price": "34.99",
        "stock": "10"
    }
}

Removed row:

{
    "changeType": "removed",
    "rowKey": "id=4",
    "changedColumns": "id, name, price, stock",
    "columnCount": 4,
    "data": {
        "id": "4",
        "name": "Widget Delta",
        "price": "24.99",
        "stock": "25"
    }
}

Summary (from key-value store, key DIFF_SUMMARY):

{
    "totalRowsA": 4,
    "totalRowsB": 4,
    "addedRows": 1,
    "removedRows": 1,
    "modifiedRows": 1,
    "unchangedRows": 2,
    "totalChanges": 3,
    "keyColumns": ["id"],
    "hasHeader": true,
    "delimiter": ","
}

Tips for best results

🔑 Always specify key columns for datasets that have a unique identifier. Without key columns, the actor uses row position — which means adding a row at the top makes every subsequent row appear as "modified."
📋 Use composite keys when no single column is unique. E.g. ["country", "city"] for geographic data or ["year", "month", "product_id"] for time-series exports.
✂️ Keep trimWhitespace enabled (the default). Spreadsheet exports often add invisible trailing spaces that would otherwise cause false "modified" detections.
🔠 Use case-insensitive mode when comparing data from different systems that may use different capitalization conventions.
📤 Use outputFormat: modified-only when you only care about value changes, not structural additions/removals. This keeps datasets small and focused.
📊 Check DIFF_SUMMARY in the key-value store for aggregate counts before diving into individual rows. If totalChanges is 0, both CSVs are identical.
🔄 Schedule weekly runs to automate monitoring of recurring CSV exports. Combine with webhooks to alert you when changes exceed a threshold.
📦 Large files: CSVs up to hundreds of thousands of rows work fine. The actor uses batched processing to handle large datasets efficiently.

Integrations

📊 CSV Diff Tool → Google Sheets Export diff results directly to a Google Sheets spreadsheet using Apify's Google Sheets integration. Create a live "change log" sheet that updates every time you re-run the diff. Useful for sharing change reports with non-technical stakeholders without manual copy-paste.

🔔 CSV Diff Tool → Slack/Discord alerts Combine with an Apify webhook that fires when the run completes. If totalChanges > 0, send a Slack message with the counts: "Product catalog updated: 3 prices changed, 1 item removed." Keeps your team informed without manual monitoring.

🔁 CSV Diff Tool → Make (formerly Integromat) or Zapier Trigger a Make scenario when the Apify run completes. Read the dataset items and route each changeType to a different action — e.g. added rows go to a CRM import, removed rows trigger a deactivation workflow, modified rows trigger a price update.

📅 Scheduled monitoring of recurring exports Use Apify's scheduler to run the diff daily or weekly against the latest export from your ERP, PIM, or data warehouse. Compare against the previous run's output (stored in a dataset) to build an automated audit trail.

🔗 Webhook-triggered diff in CI/CD pipelines Call the Apify API from your CI/CD pipeline after running a data migration. Diff the pre-migration and post-migration exports to verify the transform produced the expected changes. Fail the pipeline if unexpected rows were modified.

API usage

Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('automation-lab/csv-diff-tool').call({
    csvA: 'id,name,price\n1,Alpha,9.99\n2,Beta,14.99',
    csvB: 'id,name,price\n1,Alpha,9.99\n2,Beta,19.99',
    keyColumns: ['id'],
    outputFormat: 'all-changes',
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient

client = ApifyClient(token='YOUR_APIFY_TOKEN')

run = client.actor('automation-lab/csv-diff-tool').call(run_input={
    'csvA': 'id,name,price\n1,Alpha,9.99\n2,Beta,14.99',
    'csvB': 'id,name,price\n1,Alpha,9.99\n2,Beta,19.99',
    'keyColumns': ['id'],
    'outputFormat': 'all-changes',
})

items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

cURL

curl -X POST \
  "https://api.apify.com/v2/acts/automation-lab~csv-diff-tool/runs?token=YOUR_APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "csvA": "id,name,price\n1,Alpha,9.99\n2,Beta,14.99",
    "csvB": "id,name,price\n1,Alpha,9.99\n2,Beta,19.99",
    "keyColumns": ["id"],
    "outputFormat": "all-changes"
  }'

Use with AI agents via MCP

CSV Diff Tool is available as a tool for AI assistants that support the Model Context Protocol (MCP).

Add the Apify MCP server to your AI client — this gives you access to all Apify actors, including this one:

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/csv-diff-tool"

Setup for Claude Desktop, Cursor, or VS Code

Add this to your MCP config file:

{
    "mcpServers": {
        "apify": {
            "type": "http",
            "url": "https://mcp.apify.com?tools=automation-lab/csv-diff-tool",
            "headers": {
                "Authorization": "Bearer YOUR_APIFY_TOKEN"
            }
        }
    }
}

Example prompts for AI agents:

"Compare these two CSV exports and tell me which product prices changed."
"I have a customer list from January and one from February — find who was added or removed."
"Diff these two CSVs using 'email' as the key column and show me only the modified rows."

Legality: is it legal to compare CSV data?

Yes — this actor performs local computation only on data you provide. It makes no external HTTP requests, accesses no third-party websites, and stores nothing beyond the run's dataset and key-value store (which you control).

Your data, your responsibility: Ensure you have the right to process any personal data contained in the CSVs you compare (GDPR, CCPA, etc.). The actor does not transmit your data to any third parties.

FAQ

How fast is the comparison? For typical datasets (up to 10,000 rows), the comparison completes in under 2 seconds. Even 100,000-row CSVs typically finish in under 10 seconds since the algorithm runs in O(n) time with hash-based key lookups.

How much does it cost? The start fee is $0.005, which covers the first 1,000 rows compared. Additional rows cost $0.001 per 1,000. Two 500-row CSVs cost $0.005 total. See the Pricing section for a full table.

What if my CSV has duplicate key values? The last row with a given key value wins in the lookup map. If your data has genuine duplicates (e.g., multiple orders with the same order_id), use a composite key that makes rows unique, or leave keyColumns empty to use row-position matching.

How does it differ from a plain text diff (e.g. diff command)? A text diff treats each line as a unit and is sensitive to row order — adding one row at the top makes every subsequent row appear as "changed." CSV Diff Tool understands column structure and uses key-based matching, so reordered rows are correctly identified as unchanged, and you see column-level before/after values for modified rows.

Why are some rows showing as modified when they look identical? The most common cause is invisible whitespace — extra spaces before or after cell values. Enable trimWhitespace: true (it's on by default). The second most common cause is case differences — Apple vs apple. Enable caseSensitive: false to ignore case.

Why are all rows showing as modified when I expect 0 changes? Check that both CSVs use the same delimiter. If one is comma-separated and the other is semicolon-separated, the parser will misread column boundaries and everything will look different. Also verify that hasHeader matches — if one CSV has a header and you set hasHeader: false, the header row becomes a data row and shifts all comparisons.

Can I compare CSVs with different columns? Yes. The actor takes the union of all column names from both CSVs. Columns present in one but not the other will have empty values for the dataset they're missing from.

🔗 Apache Log Parser — Parse Apache and Nginx access logs into structured datasets with IP, path, status code, and response time fields.

🔗 Base64 Converter — Encode or decode Base64 strings and file content, with URL-safe variant support.

🔗 Barcode Generator — Generate QR codes and 1D barcodes (EAN-13, Code128, etc.) in bulk from a list of values.

🔗 Accessibility Checker — Audit web pages for WCAG accessibility violations using automated axe-core analysis.

🔗 Broken Link Checker — Crawl a website and find all broken internal and external links with their HTTP status codes.

🔗 Fake Test Data Generator — Generate bulk fake/test data with realistic names, addresses, emails, and more using Faker.js.

🔗 Bulk Image Optimizer — Compress and resize images in bulk from URLs, with WebP conversion and quality controls.

Data Deduplicator

parsebird/dataset-deduplicator

Merge and deduplicate Apify datasets by any field combination. Remove duplicate rows while keeping the first or last occurrence. Supports case-insensitive matching and whitespace trimming.

ParseBird

Dataset Deduplicator

automation-lab/dataset-dedup

Merge and deduplicate Apify datasets by any field combination. Remove duplicates, keep first or last occurrence. Case-insensitive matching, whitespace trimming. Pay per 1K items processed.

Stas Persiianenko

Sitemap Diff Tool

automation-lab/sitemap-diff-tool

Compare two XML sitemaps and find added, removed, or changed URLs. Detects lastmod, priority, and changefreq changes. Supports sitemap index files. Export results as JSON, CSV, or Excel.

Stas Persiianenko

JSON Diff Tool

automation-lab/json-diff-tool

Semantically compare two JSON objects or files. Outputs a structured diff with dot-notation paths for every added, removed, changed, and type-changed field. Supports nested objects, arrays, URL fetching, and ignore lists.

Stas Persiianenko

Website Change Monitor & Diff Tracker

ryanclinton/website-change-monitor

Monitor any website for content changes with automatic diff detection. Track pricing pages, competitor sites, ToS updates, and more. Compares snapshots, reports added/removed text, and supports CSS selector targeting for precise monitoring.

Ryan Clinton

Sitemap Inventory & Diff - URL Extractor with Change Detection

gratifying_graph/sitemap-diff

Extract every URL from a site's sitemaps, then diff against the previous run: pages added, removed, or updated since last check. Built for SEO monitoring, RAG freshness, and competitor watching.

Jimmy A

Website Change Monitor — Diff & Webhook Alerts

wsgcjj/change-monitor

Monitor any web page for content changes. Get structured diff reports showing exactly what changed (added/removed lines). Supports CSS selectors, webhook notifications, and persistent content tracking via KV store. Ideal for competitor monitoring, price tracking, and policy change detection.

陈俊杰

SaaS Pricing & Change Tracker Scraper

taroyamada/saas-change-monitor-actor

SaaS pricing change tracker scraper. Browser-based crawl of competitor pricing and policy pages with precise text-diff extraction. Returns added/removed sections, currentHash, and per-URL change events for recurring competitor watch.

naoki anzai

Dataset Classifier

lukas.priban/dataset-classifier

Automatically classify rows in any Apify dataset into categories you define. Point it at a dataset, pick a text column, provide your categories, and get back the original data with a new classification column added.