# CRM Contact Cleanup & Dedupe Prep (`critd/contact-cleanup`) Actor

Clean supplied URL, email, and address fields for contact records, preserving one row per input with changed-field, review, dedupe-key, and cross-field signals. Does not scrape, find, verify, enrich, geocode, score confidence, choose survivors, or merge contacts.

- **URL**: https://apify.com/critd/contact-cleanup.md
- **Developed by:** [Critical Distinction](https://apify.com/critd) (community)
- **Categories:** Automation, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Contact Cleanup & Dedupe Prep for Supplied Records

Prepare contact records you already have for CRM review, lead-list
quality checks, matching, or downstream cleanup. This Actor keeps one
dataset row per input record, normalizes supplied URL, email, and
U.S.-leaning address values, and returns the changed-field evidence,
review flags, invalid reasons, warnings, and match keys next to the
original row.

Use it when you need a deterministic cleanup pass before human review
or downstream matching decisions:

- Normalize supplied website, email, and address fields without live
  data sourcing.
- Keep invalid, partial, and no-actionable rows visible instead of
  dropping them from the output.
- Prepare dedupe keys and same-run candidate labels while preserving
  every input row.
- Route messy records to review with diagnostics, not with a confidence
  score or final truth verdict.

It does not scrape or find contacts, verify email deliverability,
certify postal addresses, geocode, enrich records, remove duplicates,
choose survivors, or merge CRM records.

Details below cover [what you get](#what-you-get),
[input parameters](#input-parameters), [output format](#output-format),
[fixture-backed output examples](#fixture-backed-output-examples),
[how to read common rows](#how-to-read-common-rows),
[limitations](#limitations), [permissions](#permissions),
[pricing](#pricing), and [release history](#release-history).

### What You Get

- Preserves one dataset row per supplied input record.
- Processes only the supported first-release fields:
  `recordId`, `url`, `email`, and `address`.
- Normalizes URLs into canonical `http` or `https` values with stable
  host/path/query behavior and tracking-fragment cleanup.
- Normalizes emails into stable lowercase and canonical mailbox/domain
  values, including deterministic Gmail alias handling.
- Normalizes U.S.-leaning address text into display and comparison
  forms through the shared `address-normalization` boundary.
- Emits required `recordStatus` values: `ready`, `normalized`,
  `review_needed`, `invalid`, and `no_actionable_input`.
- Emits `changedFields`, `reviewFlags`, `dedupeKeys`,
  `crossFieldSignals`, `warnings`, `invalidReasons`, and `processing`
  details so buyers can see why each row landed where it did.
- Writes a structured `OUTPUT` summary with selected controls, row
  counts, status distribution, diagnostic summaries, first-release
  limits, and explicit unsupported-capability booleans.

The output is meant to help a buyer decide what to review next. A clean
row stays clean, a changed row explains what changed, a messy row keeps
the original value next to diagnostics, and a possible same-run match
is shown as a candidate label instead of being removed or merged.

### Operating Boundaries

This README describes the product behavior of a single supplied-record
cleanup run. The Actor writes one dataset row per input record and a
structured `OUTPUT` summary; it does not configure recurrence, send
alerts, replace workflows, call sibling Actors, change Store pricing,
publish or unpublish itself, or mutate legacy Actor registrations.

Cost and billing are described in [pricing](#pricing). This README does
not claim a fixed per-record price, Pay Per Event readiness, Store launch
completion, live monitoring, workflow ownership, or legacy Actor
disposal authority.

### Input Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `records` | array of objects | *required* | Supplied contact-like records. Each record may include `recordId`, `url`, `email`, and `address`. The first-release limit is 1,000 records per run. |
| `fieldGroups` | array of strings | `["url", "email", "address"]` | Selects which supplied field families to process. Allowed values are `url`, `email`, and `address`. This is not a stage, child Actor, scraping, or live-enrichment control. |
| `reviewStrictness` | string | `standard` | Controls which deterministic review observations promote a row to `review_needed`. Allowed values are `minimal`, `standard`, and `strict`. It does not create confidence scoring, verification, survivor choice, or merge authority. |
| `dedupeKeyMode` | string | `keys_only` | Controls whether match-prep keys and same-run exact candidate labels are emitted. Allowed values are `off`, `keys_only`, and `keys_and_candidates`. It never removes rows, ranks records, chooses survivors, or merges contacts. |

`reviewStrictness` changes routing pressure only. It does not hide
changed fields, invalid reasons, warnings, dedupe keys, or cross-field
signals. `dedupeKeyMode` changes whether match-prep evidence is shown;
it does not change output cardinality or mutate any downstream system.

Example input:

```json
{
    "records": [
        {
            "recordId": "acme-hq",
            "url": " Example.com/about/?utm_source=newsletter&b=2&a=1#team ",
            "email": "Sales+ops@Acme.com",
            "address": " 123 Main St., Suite 200, Austin, Texas 78701-1234 "
        },
        {
            "recordId": "partial-invalid",
            "url": "https://valid.example",
            "email": "not-an-email"
        }
    ],
    "fieldGroups": ["url", "email", "address"],
    "reviewStrictness": "standard",
    "dedupeKeyMode": "keys_only"
}
````

### Output Format

This Actor writes two outputs:

- The default dataset contains one cleanup row per supplied record.
- The `OUTPUT` key-value record contains aggregate counts, selected
  controls, first-release limits, and explicit non-claim booleans.

Use the dataset when reviewing individual records. Use `OUTPUT` when
you need run-level counts, selected controls, diagnostic totals, and a
machine-readable reminder of unsupported capabilities.

Each dataset item includes:

```json
{
    "recordId": "acme-hq",
    "inputIndex": 0,
    "recordStatus": "review_needed",
    "input": {
        "recordId": "acme-hq",
        "url": " Example.com/about/?utm_source=newsletter&b=2&a=1#team ",
        "email": "Sales+ops@Acme.com",
        "address": " 123 Main St., Suite 200, Austin, Texas 78701-1234 "
    },
    "normalized": {
        "url": {
            "canonicalUrl": "https://example.com/about?a=1&b=2",
            "scheme": "https",
            "host": "example.com",
            "path": "/about"
        },
        "email": {
            "normalizedEmail": "sales+ops@acme.com",
            "canonicalEmail": "sales+ops@acme.com",
            "domain": "acme.com"
        },
        "address": {
            "normalizedAddress": "123 main street ste 200 austin tx 78701",
            "comparisonAddress": "123 main street austin tx 78701",
            "addressType": "street",
            "state": "tx",
            "postalCode": "78701",
            "postalCodeExtension": "1234",
            "secondaryUnitDesignator": "ste",
            "secondaryUnitIdentifier": "200"
        }
    },
    "changedFields": [
        {
            "fieldGroup": "url",
            "sourceField": "input.url",
            "targetField": "normalized.url.canonicalUrl",
            "originalValue": " Example.com/about/?utm_source=newsletter&b=2&a=1#team ",
            "normalizedValue": "https://example.com/about?a=1&b=2",
            "reasonCodes": [
                "trimmed_whitespace",
                "assumed_https",
                "removed_fragment",
                "removed_tracking_parameters",
                "sorted_query_parameters",
                "removed_trailing_slash"
            ]
        }
    ],
    "reviewFlags": [
        {
            "fieldGroup": "email",
            "sourceField": "derived.email.roleAccount",
            "sourceValue": true,
            "flagCode": "email_role_account",
            "severity": "medium",
            "strictnessThreshold": "standard",
            "message": "Email local part looks like a role or team mailbox."
        }
    ],
    "dedupeKeys": [
        {
            "fieldGroup": "email",
            "keyFamily": "email_domain",
            "keyValue": "acme.com",
            "matchScope": "organization",
            "keyStrength": "context",
            "sourceFields": ["normalized.email.domain"],
            "sourceValues": ["acme.com"],
            "candidate": null
        }
    ],
    "crossFieldSignals": [
        {
            "signalCode": "url_email_domain_mismatch",
            "fieldGroups": ["url", "email"],
            "severity": "medium",
            "sourceFields": ["normalized.url.host", "normalized.email.domain"],
            "sourceValues": ["example.com", "acme.com"],
            "reviewStrictnessThreshold": "standard",
            "statusImpact": "promotes_review_needed",
            "message": "Website host and email domain do not line up under the deterministic domain comparison rule; review before treating them as the same organization context."
        }
    ],
    "warnings": [],
    "invalidReasons": [],
    "processing": {
        "enabledFieldGroups": ["url", "email", "address"],
        "reviewStrictness": "standard",
        "dedupeKeyMode": "keys_only",
        "fieldStates": {
            "url": {
                "enabled": true,
                "inputState": "nonblank",
                "resultState": "usable"
            },
            "email": {
                "enabled": true,
                "inputState": "nonblank",
                "resultState": "usable"
            },
            "address": {
                "enabled": true,
                "inputState": "nonblank",
                "resultState": "usable"
            }
        }
    }
}
```

The `OUTPUT` summary includes:

```json
{
    "schemaVersion": "contact-cleanup-output-v1",
    "selectedControls": {
        "fieldGroups": ["url", "email", "address"],
        "reviewStrictness": "standard",
        "dedupeKeyMode": "keys_only"
    },
    "inputCount": 2,
    "emittedCount": 2,
    "emittedEqualsInputCount": true,
    "statusCounts": {
        "ready": 0,
        "normalized": 0,
        "review_needed": 2,
        "invalid": 0,
        "no_actionable_input": 0
    },
    "rowOutcomeCounts": {
        "rowsWithUsableOutput": 2,
        "rowsWithChangedFields": 2,
        "rowsWithReviewFlags": 2,
        "rowsWithDedupeKeys": 2,
        "rowsWithDuplicateCandidates": 0,
        "rowsWithCrossFieldSignals": 1,
        "rowsWithWarnings": 0,
        "rowsWithInvalidReasons": 1
    },
    "firstReleaseLimits": {
        "recordsMaxItems": 1000,
        "supportedInputFields": ["recordId", "url", "email", "address"],
        "supportedFieldGroups": ["url", "email", "address"],
        "defaultFieldGroups": ["url", "email", "address"],
        "reviewStrictnessValues": ["minimal", "standard", "strict"],
        "defaultReviewStrictness": "standard",
        "dedupeKeyModeValues": ["off", "keys_only", "keys_and_candidates"],
        "defaultDedupeKeyMode": "keys_only",
        "cleanupProfileExposed": false,
        "defaultOutputCardinality": "one_row_per_input_record"
    },
    "nonClaimSummary": {
        "emailDeliverabilityVerified": false,
        "inboxExistenceVerified": false,
        "inboxOwnershipVerified": false,
        "missingContactsFound": false,
        "sourceScrapingPerformed": false,
        "externalEnrichmentPerformed": false,
        "websiteReachabilityChecked": false,
        "postalDeliverabilityCertified": false,
        "geocodingPerformed": false,
        "demographicEnrichmentPerformed": false,
        "confidenceScoringPerformed": false,
        "crmMergePerformed": false,
        "automaticSurvivorshipSelected": false,
        "liveFreshnessChecked": false
    },
    "runDurationSeconds": 0.0
}
```

The real `OUTPUT` object also includes detailed field-group activity,
changed-field, review-flag, dedupe-key, cross-field-signal, warning,
and invalid-reason summaries.

#### Row Statuses

| Status | Meaning | How to use it |
|--------|---------|---------------|
| `ready` | Supported supplied values were already usable under the deterministic rules. | Treat as cleanup confirmation, not live verification. |
| `normalized` | At least one supplied value changed into a usable normalized value without review pressure. | Review `changedFields` if you need to explain the transformation. |
| `review_needed` | The row has usable output plus review pressure such as a partial invalid field, role/disposable email observation, duplicate candidate, or cross-field signal. | Review before matching, importing, or trusting the row in another system. |
| `invalid` | Enabled nonblank supplied values could not be normalized into usable output. | Use `invalidReasons` as diagnostics; the row is preserved on purpose. |
| `no_actionable_input` | The row had no enabled nonblank `url`, `email`, or `address` value to process. | Keep or remove it according to your own source-system rules. |

#### Key Output Fields

| Field | What it tells you | Boundary |
|-------|-------------------|----------|
| `normalized` | Canonical URL, canonical email/domain, and U.S.-leaning address display/comparison values when available. | Deterministic cleanup only; no reachability, deliverability, ownership, postal, or geocoding proof. |
| `changedFields` | Which supported values changed and which reason codes explain the change. | Explains transformations; it is not a correctness score. |
| `reviewFlags` | Deterministic observations that can make a row worth human review. | Review routing only; not a confidence score or truth verdict. |
| `dedupeKeys` | URL, email, domain, and address comparison keys, plus optional same-run candidate labels. | Match preparation only; no duplicate removal, ranking, survivor choice, or CRM merge. |
| `crossFieldSignals` | Deterministic prompts from relationships between supplied fields, such as URL/email domain mismatch. | Review prompts only; not identity, ownership, fraud, or legal proof. |
| `warnings` | Non-blocking diagnostics such as unknown address shape or disabled field group context. | Keeps uncertainty visible without claiming the row is wrong. |
| `invalidReasons` | Field-level reasons why supplied nonblank values could not be normalized. | Diagnostics only; invalid output is not a failed run by itself. |
| `processing` | Controls and per-field input/result states used for the row. | Helps explain routing; not a pricing or support guarantee. |

### Deterministic Smoke Behavior

The committed smoke input at `.actor/smoke_input.json` uses only
deterministic `.example` records. It covers ready, normalized,
no-actionable, invalid, mixed valid/invalid, duplicate-candidate,
cross-field, review, same-address, and warning-shaped rows without
depending on third-party network state.

### Fixture-Backed Output Examples

The repo includes a detailed
[output examples and support interpretation pack](./docs/output-examples.md)
for the committed 12-record smoke fixture. That pack traces examples to
the smoke input, aggregate contract test, and saved smoke/cost-matrix
run summaries.

The fixture summary is:

| Example surface | Count |
|-----------------|------:|
| Input records | 12 |
| Dataset rows emitted | 12 |
| `ready` rows | 2 |
| `normalized` rows | 1 |
| `review_needed` rows | 7 |
| `invalid` rows | 1 |
| `no_actionable_input` rows | 1 |
| Rows with duplicate candidates | 4 |
| Rows with cross-field signals | 3 |
| Rows with warnings | 1 |
| Rows with invalid reasons | 2 |

Use the examples as support guidance for reading output. They show how
duplicate candidates, domain mismatch signals, invalid diagnostics,
no-actionable rows, and address warnings preserve review evidence
without removing rows or claiming verification, enrichment, address
authority, confidence scoring, automatic dedupe, or CRM merge.

### How To Read Common Rows

**Ready or normalized rows**
Use the normalized values and changed-field ledger as cleanup evidence.
Do not treat the row as proof that a website is reachable, an inbox
exists, an address is deliverable, or a contact is current.

**Invalid rows**
Invalid rows are emitted intentionally when supplied nonblank values
cannot be normalized. The row keeps the original input and explains the
problem in `invalidReasons` so you can fix the source record or route
it for manual review.

**Partial rows**
A row can contain usable output for one field group and invalid
diagnostics for another. Keep reading the full row before discarding
it; the useful field groups remain available.

**No-actionable rows**
Blank, null, missing, or disabled field groups can produce
`no_actionable_input`. These rows preserve input cardinality. With
pay-per-usage billing, the run can still consume platform resources even
when a row has no candidate processed-contact event.

**Duplicate-candidate rows**
When `dedupeKeyMode` is `keys_and_candidates`, rows can point at
same-run peers with the same canonical URL, canonical email, or address
comparison key. That is a review queue, not an automatic dedupe result.

**Cross-field signal rows**
Signals such as URL/email domain mismatch or same-address context are
deterministic prompts from supplied values. They should guide review,
not be used as verified identity, ownership, or fraud findings.

**Warning rows**
Warnings keep uncertain context visible, such as an address shape that
could not be confidently parsed into common components. A warning can
coexist with usable output.

### Example Use Cases

**CRM intake review**
Normalize supplied website, email, and address values while preserving
invalid input and row-level review pressure for human triage.

**Lead-list cleanup before matching**
Create stable URL, mailbox, domain, and address comparison keys before
joining records against another system.

**Dedupe preparation without merge authority**
Emit deterministic keys and same-run exact candidate labels while
keeping every original row and avoiding automatic survivor selection.

### Limitations

- This Actor only processes supplied records. It does not scrape
  websites, crawl pages, find missing contacts, or fetch external
  enrichment from providers.
- URL cleanup does not prove website reachability, safety, ownership,
  live freshness, redirect equivalence, or page content.
- Email cleanup does not verify deliverability, inbox existence, inbox
  ownership, mailbox ownership, or sender compliance.
- Address cleanup is U.S.-leaning heuristic normalization and
  comparison-key preparation. It does not certify postal
  deliverability, geocode addresses, add demographics, or prove that an
  address belongs to a contact.
- Dedupe keys and same-run candidate labels are preparation signals.
  They do not cluster records, rank matches, choose survivors, remove
  duplicates, merge CRM records, or mutate downstream systems.
- `recordStatus` is deterministic routing evidence, not a confidence
  score, CRM truth verdict, legal identity claim, pricing signal, or
  support guarantee.
- High `review_needed` counts can reflect messy supplied input or
  strict review settings. They do not mean the run failed or that the
  Actor verified those rows as bad.
- No-actionable rows preserve cardinality. They are useful for audit
  trails, but they still make pay-per-usage cost harder to estimate
  from record count alone.
- Phone, company-name cleanup, CSV/CRM import, arbitrary metadata,
  fuzzy scoring, live verification, enrichment, and automatic merge
  controls are outside the first-release contract.

### Disclaimer

This Actor performs deterministic cleanup and match-preparation only.
Use its output as structured evidence for review, matching, or
downstream workflow decisions, not as proof that a contact is current,
reachable, deliverable, enriched, owned, merged, or CRM-true.

### Permissions

This Actor is designed to run with **limited permissions**. It writes
only to its default dataset and default key-value store. The current
runtime does not require access to other Apify storages, account
resources, sibling Actors, proxy groups, or third-party network
resources.

### Pricing

Recommended pricing model: **Pay per usage**. Under this model, users
pay Apify platform resource costs for the run, and there is no custom
developer charge from this Actor.

Pay-per-usage is less predictable before a run than a fixed per-record
quote. Start with a limited-scope trial, review the run cost and
`OUTPUT` counts, then scale only if the review density and platform
usage match your workflow.

This README does not configure Store pricing, claim a fixed per-record
price, or promise custom Pay Per Event charging. Any event names that
appear in internal design material are not buyer billing terms unless a
live pricing surface says so.

### Release History

See [CHANGELOG.md](./CHANGELOG.md) for version-by-version release
notes and migration guidance.

# Actor input Schema

## `records` (type: `array`):

Supplied contact-like records. Each record may include recordId, url, email, and address. Invalid-looking field values are kept for row diagnostics instead of being silently dropped.

## `fieldGroups` (type: `array`):

Choose which supplied field families to process. This is a field selector, not a child Actor, stage, scraping, or live-enrichment control.

## `reviewStrictness` (type: `string`):

Controls how strongly deterministic review signals affect row status. It does not add verification, enrichment, confidence scoring, survivor choice, or merge authority.

## `dedupeKeyMode` (type: `string`):

Controls whether match-prep keys and same-run candidate labels are emitted. It never deletes rows, ranks records, chooses survivors, or merges contacts.

## Actor input object example

```json
{
  "records": [
    {
      "recordId": "valid-all-field",
      "url": "https://ready.example/contact",
      "email": "person@ready.example",
      "address": "900 cedar street chicago il 60601"
    },
    {
      "recordId": "partial-normalized",
      "url": " partial.example/about/?utm_source=newsletter#team ",
      "email": "person@partial.example"
    },
    {
      "recordId": "no-actionable",
      "url": "   ",
      "email": null,
      "address": ""
    },
    {
      "recordId": "invalid-only",
      "url": "ftp://invalid.example",
      "email": "missing-at-sign",
      "address": "###"
    },
    {
      "recordId": "mixed-valid-invalid",
      "url": "https://mixed.example/contact",
      "email": "missing-at-sign"
    },
    {
      "recordId": "duplicate-url-a",
      "url": "https://duplicate.example/contact",
      "email": "alpha@duplicate.example"
    },
    {
      "recordId": "duplicate-url-b",
      "url": "https://duplicate.example/contact",
      "email": "beta@duplicate.example"
    },
    {
      "recordId": "domain-mismatch",
      "url": "https://mismatch.example/contact",
      "email": "owner@other.example"
    },
    {
      "recordId": "email-review",
      "email": "support@mailinator.com"
    },
    {
      "recordId": "same-address-a",
      "url": "https://north.example",
      "email": "owner@north.example",
      "address": "123 Main Street Suite 100, Austin, Texas 78701"
    },
    {
      "recordId": "same-address-b",
      "url": "https://south.example",
      "email": "owner@south.example",
      "address": "123 Main Street Suite 200, Austin, Texas 78701"
    },
    {
      "recordId": "address-warning",
      "address": "warehouse behind blue door"
    }
  ],
  "fieldGroups": [
    "url",
    "email",
    "address"
  ],
  "reviewStrictness": "standard",
  "dedupeKeyMode": "keys_and_candidates"
}
```

# Actor output Schema

## `results` (type: `string`):

One dataset row per supplied record with preserved input, grouped normalized values, changed-field entries, review flags, dedupe keys, cross-field signals, warnings, invalid reasons, and processing metadata.

## `runSummary` (type: `string`):

Structured OUTPUT summary with selected controls, emitted/input counts, status distribution, diagnostic counts, first-release limits, and explicit unsupported-capability booleans.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "records": [
        {
            "recordId": "valid-all-field",
            "url": "https://ready.example/contact",
            "email": "person@ready.example",
            "address": "900 cedar street chicago il 60601"
        },
        {
            "recordId": "partial-normalized",
            "url": " partial.example/about/?utm_source=newsletter#team ",
            "email": "person@partial.example"
        },
        {
            "recordId": "no-actionable",
            "url": "   ",
            "email": null,
            "address": ""
        },
        {
            "recordId": "invalid-only",
            "url": "ftp://invalid.example",
            "email": "missing-at-sign",
            "address": "###"
        },
        {
            "recordId": "mixed-valid-invalid",
            "url": "https://mixed.example/contact",
            "email": "missing-at-sign"
        },
        {
            "recordId": "duplicate-url-a",
            "url": "https://duplicate.example/contact",
            "email": "alpha@duplicate.example"
        },
        {
            "recordId": "duplicate-url-b",
            "url": "https://duplicate.example/contact",
            "email": "beta@duplicate.example"
        },
        {
            "recordId": "domain-mismatch",
            "url": "https://mismatch.example/contact",
            "email": "owner@other.example"
        },
        {
            "recordId": "email-review",
            "email": "support@mailinator.com"
        },
        {
            "recordId": "same-address-a",
            "url": "https://north.example",
            "email": "owner@north.example",
            "address": "123 Main Street Suite 100, Austin, Texas 78701"
        },
        {
            "recordId": "same-address-b",
            "url": "https://south.example",
            "email": "owner@south.example",
            "address": "123 Main Street Suite 200, Austin, Texas 78701"
        },
        {
            "recordId": "address-warning",
            "address": "warehouse behind blue door"
        }
    ],
    "fieldGroups": [
        "url",
        "email",
        "address"
    ],
    "reviewStrictness": "standard",
    "dedupeKeyMode": "keys_and_candidates"
};

// Run the Actor and wait for it to finish
const run = await client.actor("critd/contact-cleanup").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "records": [
        {
            "recordId": "valid-all-field",
            "url": "https://ready.example/contact",
            "email": "person@ready.example",
            "address": "900 cedar street chicago il 60601",
        },
        {
            "recordId": "partial-normalized",
            "url": " partial.example/about/?utm_source=newsletter#team ",
            "email": "person@partial.example",
        },
        {
            "recordId": "no-actionable",
            "url": "   ",
            "email": None,
            "address": "",
        },
        {
            "recordId": "invalid-only",
            "url": "ftp://invalid.example",
            "email": "missing-at-sign",
            "address": "###",
        },
        {
            "recordId": "mixed-valid-invalid",
            "url": "https://mixed.example/contact",
            "email": "missing-at-sign",
        },
        {
            "recordId": "duplicate-url-a",
            "url": "https://duplicate.example/contact",
            "email": "alpha@duplicate.example",
        },
        {
            "recordId": "duplicate-url-b",
            "url": "https://duplicate.example/contact",
            "email": "beta@duplicate.example",
        },
        {
            "recordId": "domain-mismatch",
            "url": "https://mismatch.example/contact",
            "email": "owner@other.example",
        },
        {
            "recordId": "email-review",
            "email": "support@mailinator.com",
        },
        {
            "recordId": "same-address-a",
            "url": "https://north.example",
            "email": "owner@north.example",
            "address": "123 Main Street Suite 100, Austin, Texas 78701",
        },
        {
            "recordId": "same-address-b",
            "url": "https://south.example",
            "email": "owner@south.example",
            "address": "123 Main Street Suite 200, Austin, Texas 78701",
        },
        {
            "recordId": "address-warning",
            "address": "warehouse behind blue door",
        },
    ],
    "fieldGroups": [
        "url",
        "email",
        "address",
    ],
    "reviewStrictness": "standard",
    "dedupeKeyMode": "keys_and_candidates",
}

# Run the Actor and wait for it to finish
run = client.actor("critd/contact-cleanup").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "records": [
    {
      "recordId": "valid-all-field",
      "url": "https://ready.example/contact",
      "email": "person@ready.example",
      "address": "900 cedar street chicago il 60601"
    },
    {
      "recordId": "partial-normalized",
      "url": " partial.example/about/?utm_source=newsletter#team ",
      "email": "person@partial.example"
    },
    {
      "recordId": "no-actionable",
      "url": "   ",
      "email": null,
      "address": ""
    },
    {
      "recordId": "invalid-only",
      "url": "ftp://invalid.example",
      "email": "missing-at-sign",
      "address": "###"
    },
    {
      "recordId": "mixed-valid-invalid",
      "url": "https://mixed.example/contact",
      "email": "missing-at-sign"
    },
    {
      "recordId": "duplicate-url-a",
      "url": "https://duplicate.example/contact",
      "email": "alpha@duplicate.example"
    },
    {
      "recordId": "duplicate-url-b",
      "url": "https://duplicate.example/contact",
      "email": "beta@duplicate.example"
    },
    {
      "recordId": "domain-mismatch",
      "url": "https://mismatch.example/contact",
      "email": "owner@other.example"
    },
    {
      "recordId": "email-review",
      "email": "support@mailinator.com"
    },
    {
      "recordId": "same-address-a",
      "url": "https://north.example",
      "email": "owner@north.example",
      "address": "123 Main Street Suite 100, Austin, Texas 78701"
    },
    {
      "recordId": "same-address-b",
      "url": "https://south.example",
      "email": "owner@south.example",
      "address": "123 Main Street Suite 200, Austin, Texas 78701"
    },
    {
      "recordId": "address-warning",
      "address": "warehouse behind blue door"
    }
  ],
  "fieldGroups": [
    "url",
    "email",
    "address"
  ],
  "reviewStrictness": "standard",
  "dedupeKeyMode": "keys_and_candidates"
}' |
apify call critd/contact-cleanup --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=critd/contact-cleanup",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "CRM Contact Cleanup & Dedupe Prep",
        "description": "Clean supplied URL, email, and address fields for contact records, preserving one row per input with changed-field, review, dedupe-key, and cross-field signals. Does not scrape, find, verify, enrich, geocode, score confidence, choose survivors, or merge contacts.",
        "version": "0.1",
        "x-build-id": "m4eKm40uqRInKNDXB"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/critd~contact-cleanup/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-critd-contact-cleanup",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/critd~contact-cleanup/runs": {
            "post": {
                "operationId": "runs-sync-critd-contact-cleanup",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/critd~contact-cleanup/run-sync": {
            "post": {
                "operationId": "run-sync-critd-contact-cleanup",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "records"
                ],
                "properties": {
                    "records": {
                        "title": "Records",
                        "minItems": 1,
                        "maxItems": 1000,
                        "type": "array",
                        "description": "Supplied contact-like records. Each record may include recordId, url, email, and address. Invalid-looking field values are kept for row diagnostics instead of being silently dropped.",
                        "items": {
                            "type": "object",
                            "additionalProperties": false,
                            "properties": {
                                "recordId": {
                                    "title": "Record ID",
                                    "type": "string",
                                    "description": "Optional source identifier to map output rows back to your system. It is not required to be unique and does not authorize automatic dedupe or merge.",
                                    "editor": "textfield",
                                    "nullable": true,
                                    "maxLength": 256
                                },
                                "url": {
                                    "title": "URL",
                                    "type": "string",
                                    "description": "Optional supplied website URL or bare domain to normalize. Malformed values remain row diagnostics; this Actor does not crawl the site or prove reachability.",
                                    "editor": "textfield",
                                    "nullable": true,
                                    "maxLength": 4096
                                },
                                "email": {
                                    "title": "Email",
                                    "type": "string",
                                    "description": "Optional supplied email address to normalize for cleanup and matching signals. This Actor does not verify deliverability, inbox existence, or inbox ownership.",
                                    "editor": "textfield",
                                    "nullable": true,
                                    "maxLength": 320
                                },
                                "address": {
                                    "title": "Address",
                                    "type": "string",
                                    "description": "Optional supplied U.S.-leaning address text to normalize for cleanup and comparison-key signals. This Actor does not certify postal deliverability or geocode addresses.",
                                    "editor": "textarea",
                                    "nullable": true,
                                    "maxLength": 1024
                                }
                            }
                        }
                    },
                    "fieldGroups": {
                        "title": "Field Groups",
                        "minItems": 1,
                        "uniqueItems": true,
                        "type": "array",
                        "description": "Choose which supplied field families to process. This is a field selector, not a child Actor, stage, scraping, or live-enrichment control.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "url",
                                "email",
                                "address"
                            ],
                            "enumTitles": [
                                "URL",
                                "Email",
                                "Address"
                            ]
                        },
                        "default": [
                            "url",
                            "email",
                            "address"
                        ]
                    },
                    "reviewStrictness": {
                        "title": "Review Strictness",
                        "enum": [
                            "minimal",
                            "standard",
                            "strict"
                        ],
                        "type": "string",
                        "description": "Controls how strongly deterministic review signals affect row status. It does not add verification, enrichment, confidence scoring, survivor choice, or merge authority.",
                        "default": "standard"
                    },
                    "dedupeKeyMode": {
                        "title": "Dedupe Key Mode",
                        "enum": [
                            "off",
                            "keys_only",
                            "keys_and_candidates"
                        ],
                        "type": "string",
                        "description": "Controls whether match-prep keys and same-run candidate labels are emitted. It never deletes rows, ranks records, chooses survivors, or merges contacts.",
                        "default": "keys_only"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
