CRM Contact Cleanup & Dedupe Prep avatar

CRM Contact Cleanup & Dedupe Prep

Pricing

$0.10 / 1,000 processed rows

Go to Apify Store
CRM Contact Cleanup & Dedupe Prep

CRM Contact Cleanup & Dedupe Prep

Clean supplied URL, email, and address fields for contact records, preserving one row per input with changed-field, review, dedupe-key, and cross-field signals. Does not scrape, find, verify, enrich, geocode, score confidence, choose survivors, or merge contacts.

Pricing

$0.10 / 1,000 processed rows

Rating

0.0

(0)

Developer

Critical Distinction

Critical Distinction

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

15 days ago

Last modified

Share

Prepare contact records you already have for CRM review, lead-list quality checks, matching, or downstream cleanup. This Actor accepts small inline record sets or high-volume source routes, keeps one dataset row per accepted contact record, normalizes supplied URL, email, and U.S.-leaning address values, and returns the changed-field evidence, review flags, invalid reasons, warnings, and match keys next to the original row.

Use it when you need a deterministic cleanup pass before human review or downstream matching decisions:

  • Normalize supplied website, email, and address fields without live data sourcing.
  • Keep invalid, partial, and no-actionable rows visible instead of dropping them from the output.
  • Preview CSV, JSONL, JSON-array, Dataset, or key-value-store sources before execution, with row counts and billing estimates but zero default dataset rows.
  • Prepare dedupe keys and same-run candidate labels while preserving every input row.
  • Route messy records to review with diagnostics, not with a confidence score or final truth verdict.

It does not scrape or find contacts, verify email deliverability, certify postal addresses, geocode, enrich records, remove duplicates, choose survivors, or merge CRM records.

Details below cover what you get, input parameters, input routes and source examples, output format, fixture-backed output examples, integration recipes, how to read common rows, limitations, permissions, pricing, and release history.

What You Get

  • Preserves one dataset row per supplied inline record or accepted source record.
  • Processes only the supported first-release fields: recordId, url, email, and address.
  • Keeps inline records capped at 1,000 rows while source routes can preflight and execute up to 10,000 accepted source records.
  • Supports source input from file upload strings backed by Apify KVS records, Apify Dataset rows, and selected key-value-store records.
  • Parses CSV with a header row, JSONL object lines, and JSON arrays of objects for file and key-value-store routes.
  • Normalizes URLs into canonical http or https values with stable host/path/query behavior and tracking-fragment cleanup.
  • Normalizes emails into stable lowercase and canonical mailbox/domain values, including deterministic Gmail alias handling.
  • Normalizes U.S.-leaning address text into display and comparison forms through the shared address-normalization boundary.
  • Emits required recordStatus values: ready, normalized, review_needed, invalid, and no_actionable_input.
  • Emits changedFields, reviewFlags, dedupeKeys, crossFieldSignals, warnings, invalidReasons, and processing details so buyers can see why each row landed where it did.
  • Writes a structured OUTPUT v2 summary with selected controls, ingestion diagnostics, billing preview, row counts, status distribution, diagnostic summaries, first-release limits, and explicit unsupported-capability booleans.

The output is meant to help a buyer decide what to review next. A clean row stays clean, a changed row explains what changed, a messy row keeps the original value next to diagnostics, and a possible same-run match is shown as a candidate label instead of being removed or merged.

Operating Boundaries

This README describes the product behavior of supplied-record cleanup runs, whether records arrive inline or through a supported source route. The Actor writes one dataset row per supplied inline record or accepted source record and a structured OUTPUT summary; it does not configure recurrence, send alerts, replace workflows, call sibling Actors, change Store pricing, publish or unpublish itself, or mutate legacy Actor registrations.

Cost and billing are described in pricing. This README does not configure recurrence, live monitoring, workflow ownership, support response guarantees, or legacy Actor disposal authority.

Agentic And API Use

Contact Cleanup is suitable for agents and API consumers that need a bounded cleanup step for records they already control. Public Store readback on June 4, 2026 returned critd/contact-cleanup in the allowsAgenticUsers=true filter with isWhiteListedForAgenticPayments = true and a Pay Per Event row for apify-default-dataset-item. Treat that as Store discovery and payment-eligibility readback only; it is not support readiness, a result-quality guarantee, live high-volume run proof, or a charged-event counter.

For source-route work, agents should run previewOnly = true first and read the OUTPUT summary before execution:

  • OUTPUT.ingestion shows the route, row counts, and mapping diagnostics without raw source values.
  • OUTPUT.billingPreview estimates default-dataset-row events and spend-cap posture without calling the custom charge API.
  • OUTPUT.sourcePreview adds route-specific parser and preflight diagnostics for file, Dataset, and key-value-store sources.

Execution should use a buyer-owned spend cap when appropriate. The Actor still performs deterministic cleanup only; it does not verify, enrich, score, scrape, choose survivors, or merge contacts.

Input Parameters

ParameterTypeDefaultDescription
recordsarray of objectsrequired when source is absentInline supplied contact-like records. Use this route for smaller jobs. Each record may include recordId, url, email, and address. The inline first-release limit remains 1,000 records per run.
sourceobjectrequired when records is absentHigh-volume source route. Choose exactly one route: file, dataset, or keyValueStore. Source routes reject mixed inline records, route-field mismatches, unsupported source fields, and arbitrary CRM passthrough columns.
previewOnlybooleanfalseSource-route control. When true, the Actor reads and diagnoses the selected source, writes OUTPUT diagnostics, and writes zero default dataset rows.
sourceMaxRowsinteger10000Maximum accepted source records for source routes. Sources over the cap fail before output; silent truncation is not supported.
rowLimitBehaviorstringfail_over_limitFirst-release source row-limit behavior. Over-limit sources fail before default dataset rows or OUTPUT.
fieldGroupsarray of strings["url", "email", "address"]Selects which supplied field families to process. Allowed values are url, email, and address. This is not a stage, child Actor, scraping, or live-enrichment control.
reviewStrictnessstringstandardControls which deterministic review observations promote a row to review_needed. Allowed values are minimal, standard, and strict. It does not create confidence scoring, verification, survivor choice, or merge authority.
dedupeKeyModestringkeys_onlyControls whether match-prep keys and same-run exact candidate labels are emitted. Allowed values are off, keys_only, and keys_and_candidates. It never removes rows, ranks records, chooses survivors, or merges contacts.

reviewStrictness changes routing pressure only. It does not hide changed fields, invalid reasons, warnings, dedupe keys, or cross-field signals. dedupeKeyMode changes whether match-prep evidence is shown; it does not change output cardinality or mutate any downstream system.

Example input:

{
"records": [
{
"recordId": "acme-hq",
"url": " Example.com/about/?utm_source=newsletter&b=2&a=1#team ",
"email": "Sales+ops@Acme.com",
"address": " 123 Main St., Suite 200, Austin, Texas 78701-1234 "
},
{
"recordId": "partial-invalid",
"url": "https://valid.example",
"email": "not-an-email"
}
],
"fieldGroups": ["url", "email", "address"],
"reviewStrictness": "standard",
"dedupeKeyMode": "keys_only"
}

Input Routes And Source Examples

Choose exactly one input route per run:

RouteUse it forOutput behavior
Inline recordsUp to 1,000 supplied records pasted into the input.One default dataset row per input record.
source.type = "file"A file-upload value that resolves to an Apify KVS record URL, apify://key-value-stores/<store>/records/<key>, or kvs://<store>/<key>.Preview or execute CSV, JSONL, or JSON-array source bodies after parser and row-policy preflight.
source.type = "dataset"A selected Apify Dataset ID or unique name that contains object rows.Reads selected mapped fields with paging and offset support, then previews or executes accepted records. Dataset metadata counts are advisory.
source.type = "keyValueStore"A selected KVS record containing CSV, JSONL, or a JSON array.Reads one selected text record with byte limits and format guardrails, then previews or executes accepted records.

Source rows only map to recordId, url, email, and address. Ignored fields are counted but not preserved in dataset rows or OUTPUT. Physical blank source rows are skipped before accepted-record counting. Accepted records with no enabled nonblank supported values still emit no_actionable_input rows during execution.

File Preview

Use preview first when testing a buyer file mapping. This writes a v2 OUTPUT summary with ingestion, billingPreview, and sourcePreview, but zero default dataset rows.

{
"source": {
"type": "file",
"file": "kvs://source-store/contacts.csv",
"format": "csv",
"fieldMap": {
"recordId": "id",
"url": "website",
"email": "email",
"address": "mailing_address"
}
},
"previewOnly": true,
"sourceMaxRows": 10,
"rowLimitBehavior": "fail_over_limit",
"fieldGroups": ["url", "email", "address"],
"reviewStrictness": "standard",
"dedupeKeyMode": "keys_and_candidates"
}

Fixture-backed behavior for the committed CSV source fixture:

Preview metricValue
Physical source rows seen4
Physical blank rows skipped1
Accepted records3
Accepted actionable records2
No-actionable mapped records1
Default dataset rows written0

File Execution

Run the same file route with previewOnly omitted or false after the preview looks right. Execution runs the same preflight and spend-cap checks, then emits one default dataset row per accepted source record and writes OUTPUT after dataset rows.

{
"source": {
"type": "file",
"file": "kvs://source-store/contacts.csv",
"format": "csv",
"fieldMap": {
"recordId": "id",
"url": "website",
"email": "email",
"address": "mailing_address"
}
},
"sourceMaxRows": 10,
"dedupeKeyMode": "keys_and_candidates"
}

For the committed CSV fixture, execution emits rows for alpha-1, empty-contact, and beta-2; the physical blank row is skipped, and empty-contact is preserved as a no_actionable_input row.

Dataset Source

Use a Dataset route when another Actor or task has already produced object rows in Apify storage. The Actor reads only the mapped fields it needs and records the requested offset plus advisory metadata counts in source diagnostics.

{
"source": {
"type": "dataset",
"datasetId": "source-dataset",
"offset": 0,
"fieldMap": {
"recordId": "contact_id",
"url": "website",
"email": "email_address",
"address": "street"
}
},
"previewOnly": true,
"sourceMaxRows": 300
}

Key-Value-Store Source

Use a key-value-store route when a workflow stores a CSV, JSONL, or JSON array under a known KVS key.

{
"source": {
"type": "keyValueStore",
"storeId": "source-store",
"recordKey": "contacts.jsonl",
"format": "jsonl"
},
"previewOnly": true,
"sourceMaxRows": 10000
}

format may be csv, jsonl, json_array, or auto for file and KVS routes. Dataset routes already provide object rows and should leave format empty.

Source Preflight And Spend Caps

Malformed source structure, row-cap failures, unsupported formats, over-large KVS bodies, and execution spend-cap failures stop before default dataset rows, OUTPUT, or custom charge calls. Preview remains available even when the estimated charge would exceed ACTOR_MAX_TOTAL_CHARGE_USD; in that case billingPreview reports spendCapStatus = "would_exceed_cap" and spendCapEnforced = false because preview writes zero default dataset rows.

Execution enforces the spend cap before output. If three accepted source records estimate three emitted default dataset rows at $0.0001 each and ACTOR_MAX_TOTAL_CHARGE_USD = 0.0002, execution fails before dataset rows or OUTPUT.

Output Format

This Actor writes two outputs:

  • The default dataset contains one cleanup row per supplied inline record or accepted source record.
  • The OUTPUT key-value record contains aggregate counts, selected controls, source ingestion diagnostics, billing preview, first-release limits, and explicit non-claim booleans.

Use the dataset when reviewing individual records. Use OUTPUT when you need run-level counts, selected controls, diagnostic totals, and a machine-readable reminder of unsupported capabilities.

Each dataset item includes:

{
"recordId": "acme-hq",
"inputIndex": 0,
"recordStatus": "review_needed",
"input": {
"recordId": "acme-hq",
"url": " Example.com/about/?utm_source=newsletter&b=2&a=1#team ",
"email": "Sales+ops@Acme.com",
"address": " 123 Main St., Suite 200, Austin, Texas 78701-1234 "
},
"normalized": {
"url": {
"canonicalUrl": "https://example.com/about?a=1&b=2",
"scheme": "https",
"host": "example.com",
"path": "/about"
},
"email": {
"normalizedEmail": "sales+ops@acme.com",
"canonicalEmail": "sales+ops@acme.com",
"domain": "acme.com"
},
"address": {
"normalizedAddress": "123 main street ste 200 austin tx 78701",
"comparisonAddress": "123 main street austin tx 78701",
"addressType": "street",
"state": "tx",
"postalCode": "78701",
"postalCodeExtension": "1234",
"secondaryUnitDesignator": "ste",
"secondaryUnitIdentifier": "200"
}
},
"changedFields": [
{
"fieldGroup": "url",
"sourceField": "input.url",
"targetField": "normalized.url.canonicalUrl",
"originalValue": " Example.com/about/?utm_source=newsletter&b=2&a=1#team ",
"normalizedValue": "https://example.com/about?a=1&b=2",
"reasonCodes": [
"trimmed_whitespace",
"assumed_https",
"removed_fragment",
"removed_tracking_parameters",
"sorted_query_parameters",
"removed_trailing_slash"
]
}
],
"reviewFlags": [
{
"fieldGroup": "email",
"sourceField": "derived.email.roleAccount",
"sourceValue": true,
"flagCode": "email_role_account",
"severity": "medium",
"strictnessThreshold": "standard",
"message": "Email local part looks like a role or team mailbox."
}
],
"dedupeKeys": [
{
"fieldGroup": "email",
"keyFamily": "email_domain",
"keyValue": "acme.com",
"matchScope": "organization",
"keyStrength": "context",
"sourceFields": ["normalized.email.domain"],
"sourceValues": ["acme.com"],
"candidate": null
}
],
"crossFieldSignals": [
{
"signalCode": "url_email_domain_mismatch",
"fieldGroups": ["url", "email"],
"severity": "medium",
"sourceFields": ["normalized.url.host", "normalized.email.domain"],
"sourceValues": ["example.com", "acme.com"],
"reviewStrictnessThreshold": "standard",
"statusImpact": "promotes_review_needed",
"message": "Website host and email domain do not line up under the deterministic domain comparison rule; review before treating them as the same organization context."
}
],
"warnings": [],
"invalidReasons": [],
"processing": {
"enabledFieldGroups": ["url", "email", "address"],
"reviewStrictness": "standard",
"dedupeKeyMode": "keys_only",
"fieldStates": {
"url": {
"enabled": true,
"inputState": "nonblank",
"resultState": "usable"
},
"email": {
"enabled": true,
"inputState": "nonblank",
"resultState": "usable"
},
"address": {
"enabled": true,
"inputState": "nonblank",
"resultState": "usable"
}
}
}
}

Source-route execution rows also include bounded sourceMetadata. The metadata is row identity for audit and debugging, not CRM-column passthrough:

{
"sourceMetadata": {
"sourceType": "keyValueStore",
"sourceFormat": "csv",
"acceptedRecordIndex": 0,
"sourceRowIndex": 2,
"mappedFields": ["recordId", "url", "email", "address"],
"ignoredFieldCount": 2
}
}

The OUTPUT summary includes:

{
"schemaVersion": "contact-cleanup-output-v2",
"selectedControls": {
"fieldGroups": ["url", "email", "address"],
"reviewStrictness": "standard",
"dedupeKeyMode": "keys_only"
},
"ingestion": {
"inputRoute": "inline",
"previewOnly": false,
"sourceType": null,
"sourceFormat": null,
"sourceMaxRows": null,
"rowLimitBehavior": null,
"defaultDatasetRowsWritten": null,
"rowCounts": null,
"mappingDiagnostics": null
},
"billingPreview": {
"eventName": "apify-default-dataset-item",
"eventUnit": "default_dataset_item",
"eventPriceUsd": 0.0001,
"syntheticDefaultDatasetItemEvent": true,
"customChargeApiCalled": false,
"previewOnly": false,
"acceptedRecordCount": 2,
"estimatedEmittedRows": 2,
"estimatedEventCount": 2,
"estimatedChargeUsd": 0.0002,
"spendCapStatus": "not_configured",
"spendCapEnforced": false,
"pricingEffectiveAtUtc": "2026-06-03T18:20:34.542Z",
"pricingProofState": "pre_effective_or_current_unproved",
"currentPpeProved": false,
"currentPpeProofRequired": true
},
"inputCount": 2,
"emittedCount": 2,
"emittedEqualsInputCount": true,
"statusCounts": {
"ready": 0,
"normalized": 0,
"review_needed": 2,
"invalid": 0,
"no_actionable_input": 0
},
"rowOutcomeCounts": {
"rowsWithUsableOutput": 2,
"rowsWithChangedFields": 2,
"rowsWithReviewFlags": 2,
"rowsWithDedupeKeys": 2,
"rowsWithDuplicateCandidates": 0,
"rowsWithCrossFieldSignals": 1,
"rowsWithWarnings": 0,
"rowsWithInvalidReasons": 1
},
"firstReleaseLimits": {
"recordsMaxItems": 1000,
"supportedInputFields": ["recordId", "url", "email", "address"],
"supportedFieldGroups": ["url", "email", "address"],
"defaultFieldGroups": ["url", "email", "address"],
"reviewStrictnessValues": ["minimal", "standard", "strict"],
"defaultReviewStrictness": "standard",
"dedupeKeyModeValues": ["off", "keys_only", "keys_and_candidates"],
"defaultDedupeKeyMode": "keys_only",
"cleanupProfileExposed": false,
"defaultOutputCardinality": "one_row_per_input_record"
},
"nonClaimSummary": {
"emailDeliverabilityVerified": false,
"inboxExistenceVerified": false,
"inboxOwnershipVerified": false,
"missingContactsFound": false,
"sourceScrapingPerformed": false,
"externalEnrichmentPerformed": false,
"websiteReachabilityChecked": false,
"postalDeliverabilityCertified": false,
"geocodingPerformed": false,
"demographicEnrichmentPerformed": false,
"confidenceScoringPerformed": false,
"crmMergePerformed": false,
"automaticSurvivorshipSelected": false,
"liveFreshnessChecked": false
},
"runDurationSeconds": 0.0
}

The real OUTPUT object also includes detailed field-group activity, changed-field, review-flag, dedupe-key, cross-field-signal, warning, and invalid-reason summaries. Source runs also include sourcePreview with route-specific preview or execution diagnostics; top-level ingestion and billingPreview are the stable machine-readable fields to consume first.

Row Statuses

StatusMeaningHow to use it
readySupported supplied values were already usable under the deterministic rules.Treat as cleanup confirmation, not live verification.
normalizedAt least one supplied value changed into a usable normalized value without review pressure.Review changedFields if you need to explain the transformation.
review_neededThe row has usable output plus review pressure such as a partial invalid field, role/disposable email observation, duplicate candidate, or cross-field signal.Review before matching, importing, or trusting the row in another system.
invalidEnabled nonblank supplied values could not be normalized into usable output.Use invalidReasons as diagnostics; the row is preserved on purpose.
no_actionable_inputThe row had no enabled nonblank url, email, or address value to process.Keep or remove it according to your own source-system rules.

Key Output Fields

FieldWhat it tells youBoundary
normalizedCanonical URL, canonical email/domain, and U.S.-leaning address display/comparison values when available.Deterministic cleanup only; no reachability, deliverability, ownership, postal, or geocoding proof.
changedFieldsWhich supported values changed and which reason codes explain the change.Explains transformations; it is not a correctness score.
reviewFlagsDeterministic observations that can make a row worth human review.Review routing only; not a confidence score or truth verdict.
dedupeKeysURL, email, domain, and address comparison keys, plus optional same-run candidate labels.Match preparation only; no duplicate removal, ranking, survivor choice, or CRM merge.
crossFieldSignalsDeterministic prompts from relationships between supplied fields, such as URL/email domain mismatch.Review prompts only; not identity, ownership, fraud, or legal proof.
warningsNon-blocking diagnostics such as unknown address shape or disabled field group context.Keeps uncertainty visible without claiming the row is wrong.
invalidReasonsField-level reasons why supplied nonblank values could not be normalized.Diagnostics only; invalid output is not a failed run by itself.
processingControls and per-field input/result states used for the row.Helps explain routing; not a pricing or support guarantee.

Deterministic Smoke Behavior

The committed smoke input at .actor/smoke_input.json uses only deterministic .example records. It covers ready, normalized, no-actionable, invalid, mixed valid/invalid, duplicate-candidate, cross-field, review, same-address, and warning-shaped rows without depending on third-party network state.

High-volume source-route behavior is covered by committed local fixtures under tests/fixtures/ and by the high_volume_ingestion integration tests. Those fixtures use synthetic source values and exercise file, Dataset, KVS, parser, preflight, spend-cap, source metadata, and keys_and_candidates behavior without live storage proof.

Fixture-Backed Output Examples

The repo includes a detailed ./docs/output-examples.md for the committed 12-record smoke fixture. That pack traces examples to the smoke input, aggregate contract test, and saved smoke/cost-matrix run summaries.

The fixture summary is:

Example surfaceCount
Input records12
Dataset rows emitted12
ready rows2
normalized rows1
review_needed rows7
invalid rows1
no_actionable_input rows1
Rows with duplicate candidates4
Rows with cross-field signals3
Rows with warnings1
Rows with invalid reasons2

The source-route fixture summary is:

Example surfaceCount
Physical CSV source rows seen4
Physical blank source rows skipped1
Accepted source records3
Accepted actionable source records2
Accepted no-actionable source records1
Ignored source fields counted9
Preview default dataset rows0
Execution default dataset rows3
Custom charge API calls0

Use the examples as support guidance for reading output. They show how duplicate candidates, domain mismatch signals, invalid diagnostics, no-actionable rows, and address warnings preserve review evidence without removing rows or claiming verification, enrichment, address authority, confidence scoring, automatic dedupe, or CRM merge.

Integration Recipes

Preview a buyer CSV before billing Upload or save a CSV in an Apify key-value store, map source columns to recordId, url, email, and address, and run with previewOnly = true. Review OUTPUT.ingestion.rowCounts, OUTPUT.sourcePreview.mappingDiagnostics, and OUTPUT.billingPreview.estimatedChargeUsd. No default dataset rows are written in preview.

Execute the same CSV after preview Run the same input with previewOnly omitted or false. The Actor reuses parser, row-cap, and spend-cap preflight, then writes one default dataset row per accepted source record. OUTPUT is written after dataset rows, so downstream jobs can read individual rows first and the run summary second.

Chain from an upstream Apify Dataset Use source.type = "dataset" with a Dataset ID or unique name and a field map. This route reads selected mapped fields, pages rows with an offset, and treats Dataset metadata counts as advisory. It does not claim the upstream Actor produced verified contact data.

Pass a saved JSONL or JSON-array record through KVS Use source.type = "keyValueStore" with storeId, recordKey, and format. This is useful when another system has already written a bounded text record to Apify storage. The Actor reads one selected record and does not crawl arbitrary URLs, authenticate to CRMs, or preserve unsupported columns.

Use keys_and_candidates for same-run review Set dedupeKeyMode = "keys_and_candidates" when source rows should carry exact same-run candidate labels. The high-volume route preserves peer context through its source spool. Candidate labels remain review signals; they are not duplicate removal, ranking, survivor choice, or CRM merge.

How To Read Common Rows

Ready or normalized rows Use the normalized values and changed-field ledger as cleanup evidence. Do not treat the row as proof that a website is reachable, an inbox exists, an address is deliverable, or a contact is current.

Invalid rows Invalid rows are emitted intentionally when supplied nonblank values cannot be normalized. The row keeps the original input and explains the problem in invalidReasons so you can fix the source record or route it for manual review.

Partial rows A row can contain usable output for one field group and invalid diagnostics for another. Keep reading the full row before discarding it; the useful field groups remain available.

No-actionable rows Blank, null, missing, or disabled field groups can produce no_actionable_input. These rows preserve input cardinality. Because live billing is tied to default dataset rows, no-actionable rows count as processed rows when they are emitted. Remove blank source records before a run if you do not want them included in the row count.

Physical blank source rows are different. A fully blank CSV row or blank JSONL line is skipped before accepted-record counting and is not emitted as a dataset row. An accepted source row such as a row with only recordId but no enabled nonblank url, email, or address is a no_actionable_input emitted row.

Duplicate-candidate rows When dedupeKeyMode is keys_and_candidates, rows can point at same-run peers with the same canonical URL, canonical email, or address comparison key. That is a review queue, not an automatic dedupe result.

Cross-field signal rows Signals such as URL/email domain mismatch or same-address context are deterministic prompts from supplied values. They should guide review, not be used as verified identity, ownership, or fraud findings.

Warning rows Warnings keep uncertain context visible, such as an address shape that could not be confidently parsed into common components. A warning can coexist with usable output.

Example Use Cases

CRM intake review Normalize supplied website, email, and address values while preserving invalid input and row-level review pressure for human triage.

Lead-list cleanup before matching Create stable URL, mailbox, domain, and address comparison keys before joining records against another system.

Dedupe preparation without merge authority Emit deterministic keys and same-run exact candidate labels while keeping every original row and avoiding automatic survivor selection.

High-volume Apify storage handoff Preview a CSV, JSONL, JSON-array, Dataset, or KVS source route, inspect row counts and estimated row charges, then execute only after the mapping and spend-cap posture are acceptable.

Limitations

  • This Actor only processes supplied records. It does not scrape websites, crawl pages, find missing contacts, or fetch external enrichment from providers.
  • URL cleanup does not prove website reachability, safety, ownership, live freshness, redirect equivalence, or page content.
  • Email cleanup does not verify deliverability, inbox existence, inbox ownership, mailbox ownership, or sender compliance.
  • Address cleanup is U.S.-leaning heuristic normalization and comparison-key preparation. It does not certify postal deliverability, geocode addresses, add demographics, or prove that an address belongs to a contact.
  • Dedupe keys and same-run candidate labels are preparation signals. They do not cluster records, rank matches, choose survivors, remove duplicates, merge CRM records, or mutate downstream systems.
  • recordStatus is deterministic routing evidence, not a confidence score, CRM truth verdict, legal identity claim, pricing signal, or support guarantee.
  • High review_needed counts can reflect messy supplied input or strict review settings. They do not mean the run failed or that the Actor verified those rows as bad.
  • Agentic-payment eligibility helps agents discover and pay for the Actor; it does not prove support readiness, response time, output correctness, live high-volume storage behavior, or charged-event counters.
  • No-actionable rows preserve cardinality. They are useful for audit trails, and they count as processed rows because the Actor writes them to the default dataset.
  • Source routes parse CSV, JSONL, JSON-array, Dataset, and KVS records only into supported Contact Cleanup fields. They are not spreadsheet editing, CRM import, CRM connector, request queue, OAuth, external URL crawler, or child-Actor orchestration features.
  • Phone, company-name cleanup, arbitrary metadata passthrough, fuzzy scoring, live verification, enrichment, and automatic merge controls are outside the first-release contract.

Disclaimer

This Actor performs deterministic cleanup and match-preparation only. Use its output as structured evidence for review, matching, or downstream workflow decisions, not as proof that a contact is current, reachable, deliverable, enriched, owned, merged, or CRM-true.

Permissions

This Actor is designed to run with limited permissions. Inline runs write only to the default dataset and default key-value store. Source routes additionally need read access to the selected Apify Dataset or key-value-store record supplied in the input, and write only their own default dataset rows and OUTPUT summary.

The runtime does not require account-wide resources, sibling Actors, proxy groups, request queues, third-party network resources, CRM credentials, OAuth flows, or browser sessions.

Pricing

Pricing contract: Pay Per Event at $0.10 per 1,000 processed rows, effective June 3, 2026 at 18:20:34.542 UTC. Campaign 10 Mission 6 Phase 5 collected selected public Store readback: the exact Contact row and the allowsAgenticUsers=true filtered row both report PAY_PER_EVENT with apify-default-dataset-item at $0.0001 per processed row; the pricingModel=PAY_PER_EVENT filter includes Contact, and the pricingModel=FREE filter returns zero Contact rows. No released-build charged-event counter or live high-volume source-route storage readback has been proved yet. Treat billingPreview as a local estimate and guardrail until Mission 7 or another authorized release owner reruns Store/API, released-build run and storage readback, and charged-event counter checks.

Apify implements this local pricing contract with the synthetic apify-default-dataset-item event. Each cleanup row written to the default dataset is the selected billing unit at $0.0001; the Actor does not call the custom charge API for this route. A 1,000-row execution estimates $0.10 before any account-level taxes, credits, or Apify billing adjustments once current PPE is proved on live surfaces.

Platform usage is included in the accepted event price. The structured OUTPUT summary is not billed separately. Preview-only source runs, malformed-source failures, row-cap failures, and execution spend-cap failures write zero default dataset rows. Invalid, review-needed, partial, and no-actionable accepted records are still emitted rows during execution and count toward the row total because preserving row cardinality is part of the product contract.

Release History

See ./CHANGELOG.md for version-by-version release notes and migration guidance.