# Lead List Deduplicator & Merger (`jurassic_jove/lead-deduplicator-merger`) Actor

Merge and deduplicate lead lists from multiple Apify datasets, CSV files and inline JSON into one clean, outreach-ready list. Pure data processor — no scraping, no proxies, no external APIs.

- **URL**: https://apify.com/jurassic\_jove/lead-deduplicator-merger.md
- **Developed by:** [Data Runner](https://apify.com/jurassic_jove) (community)
- **Categories:** Lead generation, Automation, Developer tools
- **Stats:** 2 total users, 1 monthly users, 50.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.00 / 1,000 record processeds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Lead List Deduplicator & Merger — Clean & Combine Lead Lists

**Merge multiple lead lists into one clean, deduplicated, outreach-ready list — in seconds.** Combine the results of several scrapers (Google Maps, Instagram, TikTok, Facebook, YouTube, TripAdvisor, website email extraction and more), remove duplicate leads, and merge partial records into complete contacts. Built for **cold email outreach, lead generation agencies, and sales teams** who run many scrapers and end up with messy, overlapping data.

This is a **pure data processor**: no scraping, no browsers, no proxies, no external APIs. It can't be blocked, it can't break from website changes, and it runs fast on the smallest memory tier.

---

### Why deduplicate and merge your lead lists?

When you scrape leads from several sources, the same business or person shows up again and again — once from Google Maps, once from Instagram, once from a website crawl. Sending to a dirty list quietly costs you money and reputation:

- **Protect your sender reputation & deliverability.** Emailing the same contact twice (or hitting old, duplicated addresses) drives spam complaints and bounces. High bounce rates and duplicate sends are two of the fastest ways to wreck a sending domain. A clean, deduplicated list keeps your inbox placement healthy.
- **Stop wasting outreach credits.** Email verification tools, enrichment APIs and sending platforms usually charge per contact. Every duplicate is money spent twice. Deduplicating *before* you verify or send is the cheapest optimization in your funnel.
- **Keep your CRM clean.** Duplicate and fragmented records pollute your CRM, break reporting, and create awkward "didn't we already talk to them?" moments. Merge first, import once.
- **Get richer records.** One source has the email, another has the phone, a third has the website. Merging fragments field-by-field turns three thin rows into one complete, ready-to-contact lead.

---

### What it does

- ✅ **Merges three input types at once** — Apify datasets, public CSV file URLs, and inline JSON. Mix and match freely.
- ✅ **Deduplicates by email, website domain, phone, or name + city** — match on any combination of keys.
- ✅ **Smart fuzzy matching across messy data** — case-insensitive emails, `mailto:` stripping, normalized domains (`https://www.Acme.com/contact` → `acme.com`), E.164 phone formatting, accent-folding for names.
- ✅ **Connected-component grouping** — if record A matches B by email and B matches C by domain, all three collapse into a single lead automatically.
- ✅ **Field-by-field merging** — keep the most complete record and fill the gaps from its duplicates. Conflicting values are preserved in a `_conflicts` field so nothing is silently lost.
- ✅ **Recognizes field variants across scrapers** — `email` / `emails[0]` / `businessEmail`, `phone` / `phoneNumber` / `phones[0]`, `website` / `url` / `websiteUrl`, `name` / `title` / `businessName` / `channelName`, and more.
- ✅ **Never crashes on bad data** — malformed CSV rows, empty datasets, mixed schemas and missing fields are skipped, counted, and reported. The run keeps going.
- ✅ **Export anywhere** — results land in a standard Apify dataset you can download as **CSV, JSON, Excel, or HTML**.

---

### How it works (3 steps)

1. **Point it at your data.** Provide one or more Apify dataset IDs from previous scraper runs, public CSV URLs, and/or paste records as inline JSON.
2. **Choose how duplicates are matched and merged.** Pick your dedupe keys (email is the default and safest) and a merge strategy.
3. **Run it.** Get back one clean, merged record per unique lead in the output dataset, plus a summary of exactly what was combined and removed.

#### Input example 1 — Merge previous scraper runs (Apify datasets)

```json
{
  "datasetIds": ["aBcD1234efGh5678", "XyZ9876wVuT5432"],
  "dedupeKeys": ["email", "domain"],
  "mergeStrategy": "most_complete"
}
````

#### Input example 2 — Merge CSV files

```json
{
  "csvUrls": [
    "https://example.com/google-maps-leads.csv",
    "https://example.com/instagram-leads.csv"
  ],
  "dedupeKeys": ["email", "phone"],
  "normalizePhones": true,
  "defaultCountry": "US"
}
```

#### Input example 3 — Paste records directly (inline JSON)

```json
{
  "inlineRecords": [
    { "businessName": "Acme Inc", "email": "Info@Acme.com", "phone": "(415) 555-2671" },
    { "name": "Acme", "emails": ["info@acme.com"], "website": "https://www.acme.com" }
  ],
  "dedupeKeys": ["email", "domain", "name+city"],
  "mergeStrategy": "most_complete",
  "keepSourceInfo": true
}
```

> 💡 You can combine all three sources in a single run. Everything is pooled and deduplicated together.

#### Input options

| Option | Type | Default | Description |
| --- | --- | --- | --- |
| `datasetIds` | string\[] | `[]` | Apify dataset IDs from previous runs. Invalid IDs are skipped with a warning. |
| `csvUrls` | string\[] | `[]` | Public CSV URLs. Tolerates BOM, `,` `;` tab `\|` delimiters, quoted fields. |
| `inlineRecords` | object\[] | `[]` | Raw records pasted as JSON. |
| `dedupeKeys` | string\[] | `["email"]` | Any of `email`, `domain`, `phone`, `name+city`. Match on ANY selected key. |
| `mergeStrategy` | string | `most_complete` | `most_complete`, `first_seen`, or `last_seen`. |
| `normalizePhones` | boolean | `true` | Format output phones as E.164 (e.g. `+14155552671`). |
| `defaultCountry` | string | `US` | ISO country code for phones without a prefix. |
| `stripPlusAliases` | boolean | `false` | Treat `john+tag@x.com` and `john@x.com` as the same lead. |
| `keepSourceInfo` | boolean | `true` | Add a `_sources` array to each merged record. |

***

### Output

Each unique lead becomes **one merged record** in the output dataset. Original fields are preserved; clean canonical `name` / `email` / `phone` / `website` / `city` fields are added; merge metadata is prefixed with `_`.

#### Example output record

```json
{
  "name": "Acme, Inc.",
  "email": "info@acme.com",
  "phone": "+14155552671",
  "website": "https://www.acme.com/contact",
  "city": "San Francisco",
  "industry": "SaaS",
  "_sources": ["dataset:aBcD1234efGh5678", "csv:example.com/instagram-leads.csv"],
  "_conflicts": { "name": ["Acme HQ"] },
  "_duplicateCount": 3
}
```

- **`_sources`** — which dataset(s)/file(s) this lead was merged from.
- **`_conflicts`** — alternate values that disagreed (e.g. two different phone numbers), so you never lose data.
- **`_duplicateCount`** — how many input records were combined into this one.

#### Run summary (key-value store → `OUTPUT`)

```json
{
  "totalInput": 22000,
  "totalOutput": 15840,
  "duplicatesRemoved": 6160,
  "duplicateRate": 0.28,
  "perSourceCounts": { "dataset:aBcD1234efGh5678": 12000, "csv:example.com/instagram-leads.csv": 10000 },
  "malformedSkipped": 12,
  "runtimeSeconds": 9
}
```

***

### Works perfectly with the rest of the suite

This Actor is the **glue** that ties your lead-generation stack together. Run any of these scrapers, then pipe their datasets straight into the Lead List Deduplicator & Merger:

- **Google Maps Lead Generator Pro** — local business leads with phone, website and address.
- **Instagram Email Scraper** — creator and business emails from Instagram.
- **TikTok Email Scraper** — contact emails from TikTok profiles.
- **Facebook Page Lead Scraper** — business contact details from Facebook.
- **YouTube Channel Email Scraper** — channel and business emails from YouTube.
- **TripAdvisor Leads Scraper** — hospitality and venue leads.
- **Website Email Extractor** — emails and contact data crawled from any website list.
- **Email Verifier & Enricher Pro** — verify and enrich your final list *after* deduplicating (dedupe first to save credits!).

**Recommended workflow:** scrape with several Actors → **deduplicate & merge here** → verify & enrich → import to your CRM or sending tool.

***

### Pricing

This Actor uses simple, transparent **pay-per-event** pricing:

- **$2.00 per 1,000 input records processed** ($0.002 per record).

You are charged per **input** record ingested — the rows you feed in — *as they are processed*, so partial runs only bill for what was actually handled. **Malformed/skipped rows are never charged.** There is no per-run start fee. Deduplication typically shrinks your list, so every downstream tool (verification, enrichment, sending) costs less afterwards — this Actor usually pays for itself on the very first run.

***

### FAQ

**What counts as a duplicate?**
Two records are duplicates if they match on **any** of your selected dedupe keys — same email, same website domain, same phone number, or same name + city. Matching is transitive: if A matches B and B matches C, all three are merged into one lead.

**Will it merge `john.doe@gmail.com` and `johndoe@gmail.com`?**
No. By design we do **not** apply Gmail-style dot-folding, because for business (B2B) addresses those can be different real inboxes. Emails are matched after lowercasing, trimming and `mailto:` removal only. (You can optionally treat `+tag` aliases as the same address with `stripPlusAliases`.)

**How does field mapping work across different scrapers?**
The Actor recognizes common field-name variants automatically — for example `email` / `emails[0]` / `businessEmail`, `phone` / `phoneNumber` / `phones[0]`, `website` / `url` / `websiteUrl`, and `name` / `title` / `businessName` / `channelName`. Unknown fields are passed through to the output untouched, so nothing is lost.

**What happens to conflicting values when records are merged?**
The winning value (per your merge strategy) becomes the field value, and any differing alternates are stored in a `_conflicts` object on the record. You keep full visibility into every value that was seen.

**Is there a file size or record limit?**
The Actor streams data in batches and keeps only one record per unique lead in memory, so it scales to large lists. For very large jobs (hundreds of thousands of records) simply bump the memory in the run options. CSV inputs are fetched whole, so for massive files prefer pushing them through an Apify dataset.

**What if a dataset ID is wrong or a CSV fails to download?**
The Actor warns and continues with the remaining sources. It only fails the run if **zero** valid records were found across everything — with a clear, actionable message.

**What happens on a partial run (timeout, stop, charge limit)?**
Records are processed and billed incrementally, and output is written at the end of processing, so a partial run bills only for the records it actually handled.

**Do I need proxies or any special setup?**
No. This is a pure data tool — no scraping, no proxies, no anti-bot concerns. Point it at your data and run.

***

### Get started

1. Add this Actor to your account.
2. Paste your dataset IDs, CSV URLs, or inline records.
3. Pick your dedupe keys and merge strategy.
4. Run, then download your clean list as CSV, JSON or Excel.

One clean list. No duplicates. Ready for outreach.

# Actor input Schema

## `datasetIds` (type: `array`):

IDs of Apify datasets produced by previous scraper runs (e.g. Google Maps, Instagram, TikTok). All items are fetched with pagination. Invalid IDs are skipped with a warning.

## `csvUrls` (type: `array`):

Public URLs to CSV files (for example key-value store download links). The parser tolerates a BOM, comma/semicolon/tab/pipe delimiters and quoted fields. Malformed rows are skipped and counted, never crashing the run.

## `inlineRecords` (type: `array`):

Raw lead records pasted directly as a JSON array of objects. Useful for quick tests or merging a small hand-built list. The example below is prefilled to show the format and how duplicates merge — replace it with your own records or clear it.

## `dedupeKeys` (type: `array`):

Which fields identify a duplicate. Records are merged if ANY selected key matches (e.g. same email OR same domain). Email is the primary, safest key. Add others to catch more duplicates.

## `mergeStrategy` (type: `string`):

How to combine the records inside a duplicate group. 'Most complete' keeps the record with the most filled-in fields and fills any gaps from its duplicates. 'First seen' / 'Last seen' prefer the earliest / latest record per field.

## `normalizePhones` (type: `boolean`):

Format output phone numbers as E.164 (e.g. +14155552671) when the number is plausible for the default country. Phones are always compared digits-only regardless of this setting.

## `defaultCountry` (type: `string`):

Two-letter country code used to interpret phone numbers that have no country prefix. Defaults to US.

## `stripPlusAliases` (type: `boolean`):

When ON, john+news@gmail.com and john@gmail.com are treated as the SAME lead for deduplication. The original deliverable email is always preserved in the output. Default OFF (safer for B2B).

## `keepSourceInfo` (type: `boolean`):

Include a \_sources array on every output record showing which dataset(s) / CSV file(s) it was merged from.

## Actor input object example

```json
{
  "datasetIds": [],
  "csvUrls": [],
  "inlineRecords": [
    {
      "businessName": "Acme Coffee Roasters",
      "email": "Hello@AcmeCoffee.com",
      "phone": "(415) 555-2671",
      "city": "San Francisco"
    },
    {
      "name": "Acme Coffee",
      "emails": [
        "hello@acmecoffee.com"
      ],
      "website": "https://www.acmecoffee.com",
      "city": "San Francisco"
    },
    {
      "businessName": "Bluebird Bakery",
      "email": "info@bluebirdbakery.co",
      "phoneNumber": "+1 212 555 0188"
    },
    {
      "title": "Bluebird Bakery",
      "email": "INFO@bluebirdbakery.co",
      "website": "bluebirdbakery.co"
    },
    {
      "name": "Sunrise Yoga Studio",
      "email": "contact@sunriseyoga.com",
      "city": "Austin"
    }
  ],
  "dedupeKeys": [
    "email"
  ],
  "mergeStrategy": "most_complete",
  "normalizePhones": true,
  "defaultCountry": "US",
  "stripPlusAliases": false,
  "keepSourceInfo": true
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "datasetIds": [],
    "csvUrls": [],
    "inlineRecords": [
        {
            "businessName": "Acme Coffee Roasters",
            "email": "Hello@AcmeCoffee.com",
            "phone": "(415) 555-2671",
            "city": "San Francisco"
        },
        {
            "name": "Acme Coffee",
            "emails": [
                "hello@acmecoffee.com"
            ],
            "website": "https://www.acmecoffee.com",
            "city": "San Francisco"
        },
        {
            "businessName": "Bluebird Bakery",
            "email": "info@bluebirdbakery.co",
            "phoneNumber": "+1 212 555 0188"
        },
        {
            "title": "Bluebird Bakery",
            "email": "INFO@bluebirdbakery.co",
            "website": "bluebirdbakery.co"
        },
        {
            "name": "Sunrise Yoga Studio",
            "email": "contact@sunriseyoga.com",
            "city": "Austin"
        }
    ],
    "dedupeKeys": [
        "email"
    ],
    "mergeStrategy": "most_complete",
    "defaultCountry": "US"
};

// Run the Actor and wait for it to finish
const run = await client.actor("jurassic_jove/lead-deduplicator-merger").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "datasetIds": [],
    "csvUrls": [],
    "inlineRecords": [
        {
            "businessName": "Acme Coffee Roasters",
            "email": "Hello@AcmeCoffee.com",
            "phone": "(415) 555-2671",
            "city": "San Francisco",
        },
        {
            "name": "Acme Coffee",
            "emails": ["hello@acmecoffee.com"],
            "website": "https://www.acmecoffee.com",
            "city": "San Francisco",
        },
        {
            "businessName": "Bluebird Bakery",
            "email": "info@bluebirdbakery.co",
            "phoneNumber": "+1 212 555 0188",
        },
        {
            "title": "Bluebird Bakery",
            "email": "INFO@bluebirdbakery.co",
            "website": "bluebirdbakery.co",
        },
        {
            "name": "Sunrise Yoga Studio",
            "email": "contact@sunriseyoga.com",
            "city": "Austin",
        },
    ],
    "dedupeKeys": ["email"],
    "mergeStrategy": "most_complete",
    "defaultCountry": "US",
}

# Run the Actor and wait for it to finish
run = client.actor("jurassic_jove/lead-deduplicator-merger").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "datasetIds": [],
  "csvUrls": [],
  "inlineRecords": [
    {
      "businessName": "Acme Coffee Roasters",
      "email": "Hello@AcmeCoffee.com",
      "phone": "(415) 555-2671",
      "city": "San Francisco"
    },
    {
      "name": "Acme Coffee",
      "emails": [
        "hello@acmecoffee.com"
      ],
      "website": "https://www.acmecoffee.com",
      "city": "San Francisco"
    },
    {
      "businessName": "Bluebird Bakery",
      "email": "info@bluebirdbakery.co",
      "phoneNumber": "+1 212 555 0188"
    },
    {
      "title": "Bluebird Bakery",
      "email": "INFO@bluebirdbakery.co",
      "website": "bluebirdbakery.co"
    },
    {
      "name": "Sunrise Yoga Studio",
      "email": "contact@sunriseyoga.com",
      "city": "Austin"
    }
  ],
  "dedupeKeys": [
    "email"
  ],
  "mergeStrategy": "most_complete",
  "defaultCountry": "US"
}' |
apify call jurassic_jove/lead-deduplicator-merger --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=jurassic_jove/lead-deduplicator-merger",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Lead List Deduplicator & Merger",
        "description": "Merge and deduplicate lead lists from multiple Apify datasets, CSV files and inline JSON into one clean, outreach-ready list. Pure data processor — no scraping, no proxies, no external APIs.",
        "version": "0.1",
        "x-build-id": "22dYRvQqgkvenIUow"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/jurassic_jove~lead-deduplicator-merger/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-jurassic_jove-lead-deduplicator-merger",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/jurassic_jove~lead-deduplicator-merger/runs": {
            "post": {
                "operationId": "runs-sync-jurassic_jove-lead-deduplicator-merger",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/jurassic_jove~lead-deduplicator-merger/run-sync": {
            "post": {
                "operationId": "run-sync-jurassic_jove-lead-deduplicator-merger",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "datasetIds": {
                        "title": "Apify dataset IDs",
                        "type": "array",
                        "description": "IDs of Apify datasets produced by previous scraper runs (e.g. Google Maps, Instagram, TikTok). All items are fetched with pagination. Invalid IDs are skipped with a warning.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "csvUrls": {
                        "title": "CSV file URLs",
                        "type": "array",
                        "description": "Public URLs to CSV files (for example key-value store download links). The parser tolerates a BOM, comma/semicolon/tab/pipe delimiters and quoted fields. Malformed rows are skipped and counted, never crashing the run.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "inlineRecords": {
                        "title": "Inline JSON records",
                        "type": "array",
                        "description": "Raw lead records pasted directly as a JSON array of objects. Useful for quick tests or merging a small hand-built list. The example below is prefilled to show the format and how duplicates merge — replace it with your own records or clear it."
                    },
                    "dedupeKeys": {
                        "title": "Deduplication keys",
                        "type": "array",
                        "description": "Which fields identify a duplicate. Records are merged if ANY selected key matches (e.g. same email OR same domain). Email is the primary, safest key. Add others to catch more duplicates.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "email",
                                "domain",
                                "phone",
                                "name+city"
                            ],
                            "enumTitles": [
                                "Email (recommended)",
                                "Website domain",
                                "Phone number",
                                "Name + City"
                            ]
                        },
                        "default": [
                            "email"
                        ]
                    },
                    "mergeStrategy": {
                        "title": "Merge strategy",
                        "enum": [
                            "most_complete",
                            "first_seen",
                            "last_seen"
                        ],
                        "type": "string",
                        "description": "How to combine the records inside a duplicate group. 'Most complete' keeps the record with the most filled-in fields and fills any gaps from its duplicates. 'First seen' / 'Last seen' prefer the earliest / latest record per field.",
                        "default": "most_complete"
                    },
                    "normalizePhones": {
                        "title": "Normalize phone numbers",
                        "type": "boolean",
                        "description": "Format output phone numbers as E.164 (e.g. +14155552671) when the number is plausible for the default country. Phones are always compared digits-only regardless of this setting.",
                        "default": true
                    },
                    "defaultCountry": {
                        "title": "Default country (ISO 3166-1 alpha-2)",
                        "type": "string",
                        "description": "Two-letter country code used to interpret phone numbers that have no country prefix. Defaults to US.",
                        "default": "US"
                    },
                    "stripPlusAliases": {
                        "title": "Strip + aliases from emails (for matching)",
                        "type": "boolean",
                        "description": "When ON, john+news@gmail.com and john@gmail.com are treated as the SAME lead for deduplication. The original deliverable email is always preserved in the output. Default OFF (safer for B2B).",
                        "default": false
                    },
                    "keepSourceInfo": {
                        "title": "Keep source info",
                        "type": "boolean",
                        "description": "Include a _sources array on every output record showing which dataset(s) / CSV file(s) it was merged from.",
                        "default": true
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
