# CSV Diff Tool (`automation-lab/csv-diff-tool`) Actor

Compare two CSV datasets and find added, removed, and modified rows. Supports key-column matching, configurable delimiters, case-insensitive comparison, and whitespace trimming. Exports a structured change report with before/after values.

- **URL**: https://apify.com/automation-lab/csv-diff-tool.md
- **Developed by:** [Stas Persiianenko](https://apify.com/automation-lab) (community)
- **Categories:** Developer tools
- **Stats:** 1 total users, 0 monthly users, 0.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## CSV Diff Tool — Compare Two CSV Files & Find Changes

🔍 **Paste two CSV datasets and instantly see what changed** — which rows were added, removed, or modified. Supports key-column matching, configurable delimiters, case-insensitive comparison, and exports a detailed change report in JSON, CSV, or Excel.

---

### What does CSV Diff Tool do?

**CSV Diff Tool** compares two CSV datasets row by row and outputs the exact differences: added rows, removed rows, and modified rows with before/after values for each changed column.

Unlike simple text-diff tools, this actor understands **CSV structure** — it matches rows by their key column values (e.g. `id`, `email`, or a composite `[country, city]`), not by line position. This means rows that were reordered between versions are correctly identified as unchanged, not as "removed then added."

The actor handles all real-world CSV quirks: quoted fields, commas inside values, semicolon-delimited European CSVs, TSV files, headers or no headers, leading/trailing whitespace, and case-insensitive comparison.

---

### Who is CSV Diff Tool for?

**📊 Data engineers and analysts** who receive periodic CSV exports from databases, ERP systems, or third-party APIs and need to audit what changed between snapshots. Instead of manually scrolling through two spreadsheets, paste both and get a structured change report in seconds.

**🛒 E-commerce operators** comparing product catalog exports — tracking price changes, stock level drops, or product removals between weekly supplier feeds. The actor flags exactly which SKUs changed and shows old vs. new values side by side.

**📋 Compliance and audit teams** verifying that data migrations, ETL transforms, or database imports produced the expected changes. Use the summary stats (added/removed/modified counts) as an audit checkpoint, and drill into modified rows to verify specific field changes.

**🔧 Developers and QA engineers** running regression tests on data pipelines or API exports. Compare baseline and updated snapshots automatically in CI workflows via the Apify API.

**📈 Operations and business intelligence teams** who track KPIs exported as CSVs (sales reports, lead lists, inventory) and need a fast way to see week-over-week deltas without writing custom scripts.

---

### Why use CSV Diff Tool?

- ✅ **Intelligent row matching** — matches rows by key column values, not line position. Rows that moved keep their identity.
- ✅ **Before/after column diffs** — modified rows show `price_before: 14.99, price_after: 19.99` so you see exactly what changed.
- ✅ **No code required** — paste CSV, click Run. No Python, no SQL, no scripting.
- ✅ **Any delimiter** — comma, semicolon, tab, pipe. Works with European CSVs out of the box.
- ✅ **Handles quoted fields** — commas inside quoted values are correctly parsed, not split.
- ✅ **Composite key support** — match on multiple columns (`["country", "city"]`) for datasets without a single unique identifier.
- ✅ **Case-insensitive mode** — treat `APPLE` and `apple` as the same value.
- ✅ **Whitespace trimming** — ignore invisible whitespace differences.
- ✅ **Export anywhere** — results in JSON, CSV, Excel, or NDJSON. Schedule via API for automated monitoring.
- ✅ **Pure computation** — no proxy, no browser. Runs in under a second for typical datasets.

---

### What data can you extract?

The actor outputs one row per change to the Apify dataset, plus a summary in the key-value store.

**Per-change row output:**

| Field | Description |
|-------|-------------|
| `changeType` | `added`, `removed`, or `modified` |
| `rowKey` | Human-readable key, e.g. `id=42` or `country=UK\|city=London` |
| `changedColumns` | Comma-separated list of columns that changed (for modified rows) |
| `columnCount` | Number of columns that changed |
| `data` | Full row data — modified rows show `field_before` and `field_after` for each changed column |

**Summary object (saved to key-value store as `DIFF_SUMMARY`):**

| Field | Description |
|-------|-------------|
| `totalRowsA` | Total rows in CSV A (baseline) |
| `totalRowsB` | Total rows in CSV B (updated) |
| `addedRows` | Rows present in B but not A |
| `removedRows` | Rows present in A but not B |
| `modifiedRows` | Rows present in both but with different values |
| `unchangedRows` | Rows identical in both datasets |
| `totalChanges` | Sum of added + removed + modified |
| `keyColumns` | Key columns used for matching |

---

### How much does it cost to compare CSV files?

This actor uses **pay-per-event (PPE) pricing** — you only pay for what you use.

| Event | Price |
|-------|-------|
| Actor start (covers first 1,000 rows compared) | $0.005 |
| Additional 1,000 rows compared | $0.001 |

**Real-world cost examples:**

| Dataset size | Cost |
|-------------|------|
| Two 50-row product catalogs | $0.005 (start fee only) |
| Two 500-row customer lists | $0.005 (start fee only) |
| Two 2,000-row inventory files | ~$0.007 (start + 3 extra units) |
| Two 10,000-row transaction logs | ~$0.024 |
| Two 50,000-row data exports | ~$0.104 |

💡 **Free plan estimate**: Apify gives new users $5 in free credits. That covers ~500 diff runs on typical datasets, or a single comparison of two 500,000-row files.

All pricing uses **tiered discounts** — heavy users automatically pay less per row.

---

### How to compare two CSV files

1. **Open [CSV Diff Tool](https://apify.com/automation-lab/csv-diff-tool)** on Apify Store.
2. Click **Try for free** to open the actor in Apify Console.
3. Paste your **baseline CSV** (the "before" snapshot) into the **CSV A** field.
4. Paste your **updated CSV** (the "after" snapshot) into the **CSV B** field.
5. Set **Key columns** to the column name(s) that uniquely identify each row (e.g. `["id"]` or `["email"]`). Leave empty to match by row position.
6. Configure **Delimiter** if you're using a non-comma separator (semicolon, tab, pipe).
7. Click **Start**. The actor typically completes in under 2 seconds.
8. View results in the **Dataset** tab — each change is a separate row with `changeType`, `rowKey`, and full `data`.
9. Export as CSV or Excel using the **Export** button.

**Example input (JSON format):**

```json
{
    "csvA": "id,name,price,stock\n1,Widget Alpha,9.99,100\n2,Widget Beta,14.99,50\n3,Widget Gamma,4.99,200",
    "csvB": "id,name,price,stock\n1,Widget Alpha,9.99,100\n2,Widget Beta,19.99,45\n5,Widget Epsilon,34.99,10",
    "keyColumns": ["id"],
    "delimiter": ",",
    "hasHeader": true,
    "outputFormat": "all-changes"
}
````

**Example for TSV files:**

```json
{
    "csvA": "id\tname\tvalue\n1\tAlpha\t10\n2\tBeta\t20",
    "csvB": "id\tname\tvalue\n1\tAlpha\t10\n2\tBeta\t25",
    "keyColumns": ["id"],
    "delimiter": "\t",
    "hasHeader": true
}
```

**Example with composite key:**

```json
{
    "csvA": "country,city,population\nUK,London,8900000\nDE,Berlin,3700000",
    "csvB": "country,city,population\nUK,London,9100000\nFR,Paris,2100000",
    "keyColumns": ["country", "city"],
    "hasHeader": true
}
```

***

### Input parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `csvA` | String | — | **Required.** The baseline (original) CSV content. |
| `csvB` | String | — | **Required.** The updated (new) CSV content. |
| `keyColumns` | Array | `[]` | Column names that uniquely identify rows. Empty = use row position. |
| `delimiter` | String | `,` | Field separator: `,` `;` `\t` or `\|` |
| `hasHeader` | Boolean | `true` | Whether the first row contains column names. |
| `caseSensitive` | Boolean | `true` | Whether value comparison is case-sensitive. |
| `trimWhitespace` | Boolean | `true` | Trim leading/trailing spaces from cell values. |
| `outputFormat` | String | `all-changes` | What to output: `all-changes`, `added-only`, `removed-only`, `modified-only`, `summary-only` |

***

### Output examples

**Modified row (price changed):**

```json
{
    "changeType": "modified",
    "rowKey": "id=2",
    "changedColumns": "price, stock",
    "columnCount": 2,
    "data": {
        "id": "2",
        "name": "Widget Beta",
        "price_before": "14.99",
        "price_after": "19.99",
        "stock_before": "50",
        "stock_after": "45"
    }
}
```

**Added row:**

```json
{
    "changeType": "added",
    "rowKey": "id=5",
    "changedColumns": "id, name, price, stock",
    "columnCount": 4,
    "data": {
        "id": "5",
        "name": "Widget Epsilon",
        "price": "34.99",
        "stock": "10"
    }
}
```

**Removed row:**

```json
{
    "changeType": "removed",
    "rowKey": "id=4",
    "changedColumns": "id, name, price, stock",
    "columnCount": 4,
    "data": {
        "id": "4",
        "name": "Widget Delta",
        "price": "24.99",
        "stock": "25"
    }
}
```

**Summary (from key-value store, key `DIFF_SUMMARY`):**

```json
{
    "totalRowsA": 4,
    "totalRowsB": 4,
    "addedRows": 1,
    "removedRows": 1,
    "modifiedRows": 1,
    "unchangedRows": 2,
    "totalChanges": 3,
    "keyColumns": ["id"],
    "hasHeader": true,
    "delimiter": ","
}
```

***

### Tips for best results

- 🔑 **Always specify key columns** for datasets that have a unique identifier. Without key columns, the actor uses row position — which means adding a row at the top makes every subsequent row appear as "modified."
- 📋 **Use composite keys** when no single column is unique. E.g. `["country", "city"]` for geographic data or `["year", "month", "product_id"]` for time-series exports.
- ✂️ **Keep trimWhitespace enabled** (the default). Spreadsheet exports often add invisible trailing spaces that would otherwise cause false "modified" detections.
- 🔠 **Use case-insensitive mode** when comparing data from different systems that may use different capitalization conventions.
- 📤 **Use `outputFormat: modified-only`** when you only care about value changes, not structural additions/removals. This keeps datasets small and focused.
- 📊 **Check `DIFF_SUMMARY` in the key-value store** for aggregate counts before diving into individual rows. If `totalChanges` is 0, both CSVs are identical.
- 🔄 **Schedule weekly runs** to automate monitoring of recurring CSV exports. Combine with webhooks to alert you when changes exceed a threshold.
- 📦 **Large files**: CSVs up to hundreds of thousands of rows work fine. The actor uses batched processing to handle large datasets efficiently.

***

### Integrations

**📊 CSV Diff Tool → Google Sheets**
Export diff results directly to a Google Sheets spreadsheet using Apify's Google Sheets integration. Create a live "change log" sheet that updates every time you re-run the diff. Useful for sharing change reports with non-technical stakeholders without manual copy-paste.

**🔔 CSV Diff Tool → Slack/Discord alerts**
Combine with an Apify webhook that fires when the run completes. If `totalChanges > 0`, send a Slack message with the counts: "Product catalog updated: 3 prices changed, 1 item removed." Keeps your team informed without manual monitoring.

**🔁 CSV Diff Tool → Make (formerly Integromat) or Zapier**
Trigger a Make scenario when the Apify run completes. Read the dataset items and route each `changeType` to a different action — e.g. `added` rows go to a CRM import, `removed` rows trigger a deactivation workflow, `modified` rows trigger a price update.

**📅 Scheduled monitoring of recurring exports**
Use Apify's scheduler to run the diff daily or weekly against the latest export from your ERP, PIM, or data warehouse. Compare against the previous run's output (stored in a dataset) to build an automated audit trail.

**🔗 Webhook-triggered diff in CI/CD pipelines**
Call the Apify API from your CI/CD pipeline after running a data migration. Diff the pre-migration and post-migration exports to verify the transform produced the expected changes. Fail the pipeline if unexpected rows were modified.

***

### Using the Apify API

#### Node.js

```javascript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('automation-lab/csv-diff-tool').call({
    csvA: 'id,name,price\n1,Alpha,9.99\n2,Beta,14.99',
    csvB: 'id,name,price\n1,Alpha,9.99\n2,Beta,19.99',
    keyColumns: ['id'],
    outputFormat: 'all-changes',
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);
```

#### Python

```python
from apify_client import ApifyClient

client = ApifyClient(token='YOUR_APIFY_TOKEN')

run = client.actor('automation-lab/csv-diff-tool').call(run_input={
    'csvA': 'id,name,price\n1,Alpha,9.99\n2,Beta,14.99',
    'csvB': 'id,name,price\n1,Alpha,9.99\n2,Beta,19.99',
    'keyColumns': ['id'],
    'outputFormat': 'all-changes',
})

items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)
```

#### cURL

```bash
curl -X POST \
  "https://api.apify.com/v2/acts/automation-lab~csv-diff-tool/runs?token=YOUR_APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "csvA": "id,name,price\n1,Alpha,9.99\n2,Beta,14.99",
    "csvB": "id,name,price\n1,Alpha,9.99\n2,Beta,19.99",
    "keyColumns": ["id"],
    "outputFormat": "all-changes"
  }'
```

***

### Use with AI agents via MCP

CSV Diff Tool is available as a tool for AI assistants that support the [Model Context Protocol (MCP)](https://docs.apify.com/platform/integrations/mcp).

Add the Apify MCP server to your AI client — this gives you access to all Apify actors, including this one:

#### Setup for Claude Code

```bash
claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/csv-diff-tool"
```

#### Setup for Claude Desktop, Cursor, or VS Code

Add this to your MCP config file:

```json
{
    "mcpServers": {
        "apify": {
            "type": "http",
            "url": "https://mcp.apify.com?tools=automation-lab/csv-diff-tool",
            "headers": {
                "Authorization": "Bearer YOUR_APIFY_TOKEN"
            }
        }
    }
}
```

**Example prompts for AI agents:**

- *"Compare these two CSV exports and tell me which product prices changed."*
- *"I have a customer list from January and one from February — find who was added or removed."*
- *"Diff these two CSVs using 'email' as the key column and show me only the modified rows."*

***

### Is it legal to compare CSV data?

Yes — this actor performs **local computation only** on data you provide. It makes no external HTTP requests, accesses no third-party websites, and stores nothing beyond the run's dataset and key-value store (which you control).

**Your data, your responsibility**: Ensure you have the right to process any personal data contained in the CSVs you compare (GDPR, CCPA, etc.). The actor does not transmit your data to any third parties.

***

### FAQ

**How fast is the comparison?**
For typical datasets (up to 10,000 rows), the comparison completes in under 2 seconds. Even 100,000-row CSVs typically finish in under 10 seconds since the algorithm runs in O(n) time with hash-based key lookups.

**How much does it cost?**
The start fee is $0.005, which covers the first 1,000 rows compared. Additional rows cost $0.001 per 1,000. Two 500-row CSVs cost $0.005 total. See the [Pricing section](#how-much-does-it-cost-to-compare-csv-files) for a full table.

**What if my CSV has duplicate key values?**
The last row with a given key value wins in the lookup map. If your data has genuine duplicates (e.g., multiple orders with the same `order_id`), use a composite key that makes rows unique, or leave `keyColumns` empty to use row-position matching.

**How does it differ from a plain text diff (e.g. `diff` command)?**
A text diff treats each line as a unit and is sensitive to row order — adding one row at the top makes every subsequent row appear as "changed." CSV Diff Tool understands column structure and uses key-based matching, so reordered rows are correctly identified as unchanged, and you see column-level before/after values for modified rows.

**Why are some rows showing as modified when they look identical?**
The most common cause is invisible whitespace — extra spaces before or after cell values. Enable `trimWhitespace: true` (it's on by default). The second most common cause is case differences — `Apple` vs `apple`. Enable `caseSensitive: false` to ignore case.

**Why are all rows showing as modified when I expect 0 changes?**
Check that both CSVs use the same delimiter. If one is comma-separated and the other is semicolon-separated, the parser will misread column boundaries and everything will look different. Also verify that `hasHeader` matches — if one CSV has a header and you set `hasHeader: false`, the header row becomes a data row and shifts all comparisons.

**Can I compare CSVs with different columns?**
Yes. The actor takes the union of all column names from both CSVs. Columns present in one but not the other will have empty values for the dataset they're missing from.

***

### Other data tools from automation-lab

🔗 [**Apache Log Parser**](https://apify.com/automation-lab/apache-log-parser) — Parse Apache and Nginx access logs into structured datasets with IP, path, status code, and response time fields.

🔗 [**Base64 Converter**](https://apify.com/automation-lab/base64-converter) — Encode or decode Base64 strings and file content, with URL-safe variant support.

🔗 [**Barcode Generator**](https://apify.com/automation-lab/barcode-generator) — Generate QR codes and 1D barcodes (EAN-13, Code128, etc.) in bulk from a list of values.

🔗 [**Accessibility Checker**](https://apify.com/automation-lab/accessibility-checker) — Audit web pages for WCAG accessibility violations using automated axe-core analysis.

🔗 [**Broken Link Checker**](https://apify.com/automation-lab/broken-link-checker) — Crawl a website and find all broken internal and external links with their HTTP status codes.

🔗 [**Fake Test Data Generator**](https://apify.com/automation-lab/fake-test-data-generator) — Generate bulk fake/test data with realistic names, addresses, emails, and more using Faker.js.

🔗 [**Bulk Image Optimizer**](https://apify.com/automation-lab/bulk-image-optimizer) — Compress and resize images in bulk from URLs, with WebP conversion and quality controls.

# Actor input Schema

## `csvA` (type: `string`):

Paste the baseline (original) CSV content here. This is the 'before' dataset.

## `csvB` (type: `string`):

Paste the updated (new) CSV content here. This is the 'after' dataset.

## `keyColumns` (type: `array`):

Column names that uniquely identify each row (e.g. 'id', or \['country', 'city']). If empty, row position (1, 2, 3...) is used for matching.

## `delimiter` (type: `string`):

Field separator character. Use comma for standard CSV, tab (\t) for TSV, semicolon for European formats.

## `hasHeader` (type: `boolean`):

Enable if the first row contains column names. Disable if the CSV has no header row (columns will be named col1, col2, etc.).

## `caseSensitive` (type: `boolean`):

Enable to treat 'Apple' and 'apple' as different values. Disable for case-insensitive matching.

## `trimWhitespace` (type: `boolean`):

Trim leading/trailing spaces from cell values before comparison. Recommended to avoid false positives from invisible whitespace.

## `outputFormat` (type: `string`):

Controls what gets written to the dataset. 'all-changes' outputs one row per changed record. 'summary-only' outputs only the summary statistics.

## Actor input object example

```json
{
  "csvA": "id,name,price,stock\n1,Widget Alpha,9.99,100\n2,Widget Beta,14.99,50\n3,Widget Gamma,4.99,200\n4,Widget Delta,24.99,25",
  "csvB": "id,name,price,stock\n1,Widget Alpha,9.99,100\n2,Widget Beta,19.99,45\n3,Widget Gamma,4.99,200\n5,Widget Epsilon,34.99,10",
  "keyColumns": [
    "id"
  ],
  "delimiter": ",",
  "hasHeader": true,
  "caseSensitive": true,
  "trimWhitespace": true,
  "outputFormat": "all-changes"
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "csvA": `id,name,price,stock
1,Widget Alpha,9.99,100
2,Widget Beta,14.99,50
3,Widget Gamma,4.99,200
4,Widget Delta,24.99,25`,
    "csvB": `id,name,price,stock
1,Widget Alpha,9.99,100
2,Widget Beta,19.99,45
3,Widget Gamma,4.99,200
5,Widget Epsilon,34.99,10`,
    "keyColumns": [
        "id"
    ],
    "delimiter": ",",
    "hasHeader": true,
    "caseSensitive": true,
    "trimWhitespace": true,
    "outputFormat": "all-changes"
};

// Run the Actor and wait for it to finish
const run = await client.actor("automation-lab/csv-diff-tool").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "csvA": """id,name,price,stock
1,Widget Alpha,9.99,100
2,Widget Beta,14.99,50
3,Widget Gamma,4.99,200
4,Widget Delta,24.99,25""",
    "csvB": """id,name,price,stock
1,Widget Alpha,9.99,100
2,Widget Beta,19.99,45
3,Widget Gamma,4.99,200
5,Widget Epsilon,34.99,10""",
    "keyColumns": ["id"],
    "delimiter": ",",
    "hasHeader": True,
    "caseSensitive": True,
    "trimWhitespace": True,
    "outputFormat": "all-changes",
}

# Run the Actor and wait for it to finish
run = client.actor("automation-lab/csv-diff-tool").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "csvA": "id,name,price,stock\\n1,Widget Alpha,9.99,100\\n2,Widget Beta,14.99,50\\n3,Widget Gamma,4.99,200\\n4,Widget Delta,24.99,25",
  "csvB": "id,name,price,stock\\n1,Widget Alpha,9.99,100\\n2,Widget Beta,19.99,45\\n3,Widget Gamma,4.99,200\\n5,Widget Epsilon,34.99,10",
  "keyColumns": [
    "id"
  ],
  "delimiter": ",",
  "hasHeader": true,
  "caseSensitive": true,
  "trimWhitespace": true,
  "outputFormat": "all-changes"
}' |
apify call automation-lab/csv-diff-tool --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=automation-lab/csv-diff-tool",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "CSV Diff Tool",
        "description": "Compare two CSV datasets and find added, removed, and modified rows. Supports key-column matching, configurable delimiters, case-insensitive comparison, and whitespace trimming. Exports a structured change report with before/after values.",
        "version": "0.1",
        "x-build-id": "EReANMQyjZUxiDPLK"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/automation-lab~csv-diff-tool/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-automation-lab-csv-diff-tool",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/automation-lab~csv-diff-tool/runs": {
            "post": {
                "operationId": "runs-sync-automation-lab-csv-diff-tool",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/automation-lab~csv-diff-tool/run-sync": {
            "post": {
                "operationId": "run-sync-automation-lab-csv-diff-tool",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "csvA",
                    "csvB"
                ],
                "properties": {
                    "csvA": {
                        "title": "📄 CSV A (baseline)",
                        "type": "string",
                        "description": "Paste the baseline (original) CSV content here. This is the 'before' dataset."
                    },
                    "csvB": {
                        "title": "📄 CSV B (updated)",
                        "type": "string",
                        "description": "Paste the updated (new) CSV content here. This is the 'after' dataset."
                    },
                    "keyColumns": {
                        "title": "🔑 Key columns",
                        "type": "array",
                        "description": "Column names that uniquely identify each row (e.g. 'id', or ['country', 'city']). If empty, row position (1, 2, 3...) is used for matching.",
                        "items": {
                            "type": "string"
                        },
                        "default": []
                    },
                    "delimiter": {
                        "title": "🔣 Delimiter",
                        "enum": [
                            ",",
                            ";",
                            "\t",
                            "|"
                        ],
                        "type": "string",
                        "description": "Field separator character. Use comma for standard CSV, tab (\\t) for TSV, semicolon for European formats.",
                        "default": ","
                    },
                    "hasHeader": {
                        "title": "📋 First row is header",
                        "type": "boolean",
                        "description": "Enable if the first row contains column names. Disable if the CSV has no header row (columns will be named col1, col2, etc.).",
                        "default": true
                    },
                    "caseSensitive": {
                        "title": "🔠 Case-sensitive comparison",
                        "type": "boolean",
                        "description": "Enable to treat 'Apple' and 'apple' as different values. Disable for case-insensitive matching.",
                        "default": true
                    },
                    "trimWhitespace": {
                        "title": "✂️ Trim whitespace",
                        "type": "boolean",
                        "description": "Trim leading/trailing spaces from cell values before comparison. Recommended to avoid false positives from invisible whitespace.",
                        "default": true
                    },
                    "outputFormat": {
                        "title": "📊 Output format",
                        "enum": [
                            "all-changes",
                            "added-only",
                            "removed-only",
                            "modified-only",
                            "summary-only"
                        ],
                        "type": "string",
                        "description": "Controls what gets written to the dataset. 'all-changes' outputs one row per changed record. 'summary-only' outputs only the summary statistics.",
                        "default": "all-changes"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
