CRM Contact Cleanup & Dedupe Prep
Pricing
Pay per usage
CRM Contact Cleanup & Dedupe Prep
Clean supplied URL, email, and address fields for contact records, preserving one row per input with changed-field, review, dedupe-key, and cross-field signals. Does not scrape, find, verify, enrich, geocode, score confidence, choose survivors, or merge contacts.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Critical Distinction
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Contact Cleanup & Dedupe Prep for Supplied Records
Prepare contact records you already have for CRM review, lead-list quality checks, matching, or downstream cleanup. This Actor keeps one dataset row per input record, normalizes supplied URL, email, and U.S.-leaning address values, and returns the changed-field evidence, review flags, invalid reasons, warnings, and match keys next to the original row.
Use it when you need a deterministic cleanup pass before human review or downstream matching decisions:
- Normalize supplied website, email, and address fields without live data sourcing.
- Keep invalid, partial, and no-actionable rows visible instead of dropping them from the output.
- Prepare dedupe keys and same-run candidate labels while preserving every input row.
- Route messy records to review with diagnostics, not with a confidence score or final truth verdict.
It does not scrape or find contacts, verify email deliverability, certify postal addresses, geocode, enrich records, remove duplicates, choose survivors, or merge CRM records.
Details below cover what you get, input parameters, output format, fixture-backed output examples, how to read common rows, limitations, permissions, pricing, and release history.
What You Get
- Preserves one dataset row per supplied input record.
- Processes only the supported first-release fields:
recordId,url,email, andaddress. - Normalizes URLs into canonical
httporhttpsvalues with stable host/path/query behavior and tracking-fragment cleanup. - Normalizes emails into stable lowercase and canonical mailbox/domain values, including deterministic Gmail alias handling.
- Normalizes U.S.-leaning address text into display and comparison
forms through the shared
address-normalizationboundary. - Emits required
recordStatusvalues:ready,normalized,review_needed,invalid, andno_actionable_input. - Emits
changedFields,reviewFlags,dedupeKeys,crossFieldSignals,warnings,invalidReasons, andprocessingdetails so buyers can see why each row landed where it did. - Writes a structured
OUTPUTsummary with selected controls, row counts, status distribution, diagnostic summaries, first-release limits, and explicit unsupported-capability booleans.
The output is meant to help a buyer decide what to review next. A clean row stays clean, a changed row explains what changed, a messy row keeps the original value next to diagnostics, and a possible same-run match is shown as a candidate label instead of being removed or merged.
Operating Boundaries
This README describes the product behavior of a single supplied-record
cleanup run. The Actor writes one dataset row per input record and a
structured OUTPUT summary; it does not configure recurrence, send
alerts, replace workflows, call sibling Actors, change Store pricing,
publish or unpublish itself, or mutate legacy Actor registrations.
Cost and billing are described in pricing. This README does not claim a fixed per-record price, Pay Per Event readiness, Store launch completion, live monitoring, workflow ownership, or legacy Actor disposal authority.
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
records | array of objects | required | Supplied contact-like records. Each record may include recordId, url, email, and address. The first-release limit is 1,000 records per run. |
fieldGroups | array of strings | ["url", "email", "address"] | Selects which supplied field families to process. Allowed values are url, email, and address. This is not a stage, child Actor, scraping, or live-enrichment control. |
reviewStrictness | string | standard | Controls which deterministic review observations promote a row to review_needed. Allowed values are minimal, standard, and strict. It does not create confidence scoring, verification, survivor choice, or merge authority. |
dedupeKeyMode | string | keys_only | Controls whether match-prep keys and same-run exact candidate labels are emitted. Allowed values are off, keys_only, and keys_and_candidates. It never removes rows, ranks records, chooses survivors, or merges contacts. |
reviewStrictness changes routing pressure only. It does not hide
changed fields, invalid reasons, warnings, dedupe keys, or cross-field
signals. dedupeKeyMode changes whether match-prep evidence is shown;
it does not change output cardinality or mutate any downstream system.
Example input:
{"records": [{"recordId": "acme-hq","url": " Example.com/about/?utm_source=newsletter&b=2&a=1#team ","email": "Sales+ops@Acme.com","address": " 123 Main St., Suite 200, Austin, Texas 78701-1234 "},{"recordId": "partial-invalid","url": "https://valid.example","email": "not-an-email"}],"fieldGroups": ["url", "email", "address"],"reviewStrictness": "standard","dedupeKeyMode": "keys_only"}
Output Format
This Actor writes two outputs:
- The default dataset contains one cleanup row per supplied record.
- The
OUTPUTkey-value record contains aggregate counts, selected controls, first-release limits, and explicit non-claim booleans.
Use the dataset when reviewing individual records. Use OUTPUT when
you need run-level counts, selected controls, diagnostic totals, and a
machine-readable reminder of unsupported capabilities.
Each dataset item includes:
{"recordId": "acme-hq","inputIndex": 0,"recordStatus": "review_needed","input": {"recordId": "acme-hq","url": " Example.com/about/?utm_source=newsletter&b=2&a=1#team ","email": "Sales+ops@Acme.com","address": " 123 Main St., Suite 200, Austin, Texas 78701-1234 "},"normalized": {"url": {"canonicalUrl": "https://example.com/about?a=1&b=2","scheme": "https","host": "example.com","path": "/about"},"email": {"normalizedEmail": "sales+ops@acme.com","canonicalEmail": "sales+ops@acme.com","domain": "acme.com"},"address": {"normalizedAddress": "123 main street ste 200 austin tx 78701","comparisonAddress": "123 main street austin tx 78701","addressType": "street","state": "tx","postalCode": "78701","postalCodeExtension": "1234","secondaryUnitDesignator": "ste","secondaryUnitIdentifier": "200"}},"changedFields": [{"fieldGroup": "url","sourceField": "input.url","targetField": "normalized.url.canonicalUrl","originalValue": " Example.com/about/?utm_source=newsletter&b=2&a=1#team ","normalizedValue": "https://example.com/about?a=1&b=2","reasonCodes": ["trimmed_whitespace","assumed_https","removed_fragment","removed_tracking_parameters","sorted_query_parameters","removed_trailing_slash"]}],"reviewFlags": [{"fieldGroup": "email","sourceField": "derived.email.roleAccount","sourceValue": true,"flagCode": "email_role_account","severity": "medium","strictnessThreshold": "standard","message": "Email local part looks like a role or team mailbox."}],"dedupeKeys": [{"fieldGroup": "email","keyFamily": "email_domain","keyValue": "acme.com","matchScope": "organization","keyStrength": "context","sourceFields": ["normalized.email.domain"],"sourceValues": ["acme.com"],"candidate": null}],"crossFieldSignals": [{"signalCode": "url_email_domain_mismatch","fieldGroups": ["url", "email"],"severity": "medium","sourceFields": ["normalized.url.host", "normalized.email.domain"],"sourceValues": ["example.com", "acme.com"],"reviewStrictnessThreshold": "standard","statusImpact": "promotes_review_needed","message": "Website host and email domain do not line up under the deterministic domain comparison rule; review before treating them as the same organization context."}],"warnings": [],"invalidReasons": [],"processing": {"enabledFieldGroups": ["url", "email", "address"],"reviewStrictness": "standard","dedupeKeyMode": "keys_only","fieldStates": {"url": {"enabled": true,"inputState": "nonblank","resultState": "usable"},"email": {"enabled": true,"inputState": "nonblank","resultState": "usable"},"address": {"enabled": true,"inputState": "nonblank","resultState": "usable"}}}}
The OUTPUT summary includes:
{"schemaVersion": "contact-cleanup-output-v1","selectedControls": {"fieldGroups": ["url", "email", "address"],"reviewStrictness": "standard","dedupeKeyMode": "keys_only"},"inputCount": 2,"emittedCount": 2,"emittedEqualsInputCount": true,"statusCounts": {"ready": 0,"normalized": 0,"review_needed": 2,"invalid": 0,"no_actionable_input": 0},"rowOutcomeCounts": {"rowsWithUsableOutput": 2,"rowsWithChangedFields": 2,"rowsWithReviewFlags": 2,"rowsWithDedupeKeys": 2,"rowsWithDuplicateCandidates": 0,"rowsWithCrossFieldSignals": 1,"rowsWithWarnings": 0,"rowsWithInvalidReasons": 1},"firstReleaseLimits": {"recordsMaxItems": 1000,"supportedInputFields": ["recordId", "url", "email", "address"],"supportedFieldGroups": ["url", "email", "address"],"defaultFieldGroups": ["url", "email", "address"],"reviewStrictnessValues": ["minimal", "standard", "strict"],"defaultReviewStrictness": "standard","dedupeKeyModeValues": ["off", "keys_only", "keys_and_candidates"],"defaultDedupeKeyMode": "keys_only","cleanupProfileExposed": false,"defaultOutputCardinality": "one_row_per_input_record"},"nonClaimSummary": {"emailDeliverabilityVerified": false,"inboxExistenceVerified": false,"inboxOwnershipVerified": false,"missingContactsFound": false,"sourceScrapingPerformed": false,"externalEnrichmentPerformed": false,"websiteReachabilityChecked": false,"postalDeliverabilityCertified": false,"geocodingPerformed": false,"demographicEnrichmentPerformed": false,"confidenceScoringPerformed": false,"crmMergePerformed": false,"automaticSurvivorshipSelected": false,"liveFreshnessChecked": false},"runDurationSeconds": 0.0}
The real OUTPUT object also includes detailed field-group activity,
changed-field, review-flag, dedupe-key, cross-field-signal, warning,
and invalid-reason summaries.
Row Statuses
| Status | Meaning | How to use it |
|---|---|---|
ready | Supported supplied values were already usable under the deterministic rules. | Treat as cleanup confirmation, not live verification. |
normalized | At least one supplied value changed into a usable normalized value without review pressure. | Review changedFields if you need to explain the transformation. |
review_needed | The row has usable output plus review pressure such as a partial invalid field, role/disposable email observation, duplicate candidate, or cross-field signal. | Review before matching, importing, or trusting the row in another system. |
invalid | Enabled nonblank supplied values could not be normalized into usable output. | Use invalidReasons as diagnostics; the row is preserved on purpose. |
no_actionable_input | The row had no enabled nonblank url, email, or address value to process. | Keep or remove it according to your own source-system rules. |
Key Output Fields
| Field | What it tells you | Boundary |
|---|---|---|
normalized | Canonical URL, canonical email/domain, and U.S.-leaning address display/comparison values when available. | Deterministic cleanup only; no reachability, deliverability, ownership, postal, or geocoding proof. |
changedFields | Which supported values changed and which reason codes explain the change. | Explains transformations; it is not a correctness score. |
reviewFlags | Deterministic observations that can make a row worth human review. | Review routing only; not a confidence score or truth verdict. |
dedupeKeys | URL, email, domain, and address comparison keys, plus optional same-run candidate labels. | Match preparation only; no duplicate removal, ranking, survivor choice, or CRM merge. |
crossFieldSignals | Deterministic prompts from relationships between supplied fields, such as URL/email domain mismatch. | Review prompts only; not identity, ownership, fraud, or legal proof. |
warnings | Non-blocking diagnostics such as unknown address shape or disabled field group context. | Keeps uncertainty visible without claiming the row is wrong. |
invalidReasons | Field-level reasons why supplied nonblank values could not be normalized. | Diagnostics only; invalid output is not a failed run by itself. |
processing | Controls and per-field input/result states used for the row. | Helps explain routing; not a pricing or support guarantee. |
Deterministic Smoke Behavior
The committed smoke input at .actor/smoke_input.json uses only
deterministic .example records. It covers ready, normalized,
no-actionable, invalid, mixed valid/invalid, duplicate-candidate,
cross-field, review, same-address, and warning-shaped rows without
depending on third-party network state.
Fixture-Backed Output Examples
The repo includes a detailed ./docs/output-examples.md for the committed 12-record smoke fixture. That pack traces examples to the smoke input, aggregate contract test, and saved smoke/cost-matrix run summaries.
The fixture summary is:
| Example surface | Count |
|---|---|
| Input records | 12 |
| Dataset rows emitted | 12 |
ready rows | 2 |
normalized rows | 1 |
review_needed rows | 7 |
invalid rows | 1 |
no_actionable_input rows | 1 |
| Rows with duplicate candidates | 4 |
| Rows with cross-field signals | 3 |
| Rows with warnings | 1 |
| Rows with invalid reasons | 2 |
Use the examples as support guidance for reading output. They show how duplicate candidates, domain mismatch signals, invalid diagnostics, no-actionable rows, and address warnings preserve review evidence without removing rows or claiming verification, enrichment, address authority, confidence scoring, automatic dedupe, or CRM merge.
How To Read Common Rows
Ready or normalized rows Use the normalized values and changed-field ledger as cleanup evidence. Do not treat the row as proof that a website is reachable, an inbox exists, an address is deliverable, or a contact is current.
Invalid rows
Invalid rows are emitted intentionally when supplied nonblank values
cannot be normalized. The row keeps the original input and explains the
problem in invalidReasons so you can fix the source record or route
it for manual review.
Partial rows A row can contain usable output for one field group and invalid diagnostics for another. Keep reading the full row before discarding it; the useful field groups remain available.
No-actionable rows
Blank, null, missing, or disabled field groups can produce
no_actionable_input. These rows preserve input cardinality. With
pay-per-usage billing, the run can still consume platform resources even
when a row has no candidate processed-contact event.
Duplicate-candidate rows
When dedupeKeyMode is keys_and_candidates, rows can point at
same-run peers with the same canonical URL, canonical email, or address
comparison key. That is a review queue, not an automatic dedupe result.
Cross-field signal rows Signals such as URL/email domain mismatch or same-address context are deterministic prompts from supplied values. They should guide review, not be used as verified identity, ownership, or fraud findings.
Warning rows Warnings keep uncertain context visible, such as an address shape that could not be confidently parsed into common components. A warning can coexist with usable output.
Example Use Cases
CRM intake review Normalize supplied website, email, and address values while preserving invalid input and row-level review pressure for human triage.
Lead-list cleanup before matching Create stable URL, mailbox, domain, and address comparison keys before joining records against another system.
Dedupe preparation without merge authority Emit deterministic keys and same-run exact candidate labels while keeping every original row and avoiding automatic survivor selection.
Limitations
- This Actor only processes supplied records. It does not scrape websites, crawl pages, find missing contacts, or fetch external enrichment from providers.
- URL cleanup does not prove website reachability, safety, ownership, live freshness, redirect equivalence, or page content.
- Email cleanup does not verify deliverability, inbox existence, inbox ownership, mailbox ownership, or sender compliance.
- Address cleanup is U.S.-leaning heuristic normalization and comparison-key preparation. It does not certify postal deliverability, geocode addresses, add demographics, or prove that an address belongs to a contact.
- Dedupe keys and same-run candidate labels are preparation signals. They do not cluster records, rank matches, choose survivors, remove duplicates, merge CRM records, or mutate downstream systems.
recordStatusis deterministic routing evidence, not a confidence score, CRM truth verdict, legal identity claim, pricing signal, or support guarantee.- High
review_neededcounts can reflect messy supplied input or strict review settings. They do not mean the run failed or that the Actor verified those rows as bad. - No-actionable rows preserve cardinality. They are useful for audit trails, but they still make pay-per-usage cost harder to estimate from record count alone.
- Phone, company-name cleanup, CSV/CRM import, arbitrary metadata, fuzzy scoring, live verification, enrichment, and automatic merge controls are outside the first-release contract.
Disclaimer
This Actor performs deterministic cleanup and match-preparation only. Use its output as structured evidence for review, matching, or downstream workflow decisions, not as proof that a contact is current, reachable, deliverable, enriched, owned, merged, or CRM-true.
Permissions
This Actor is designed to run with limited permissions. It writes only to its default dataset and default key-value store. The current runtime does not require access to other Apify storages, account resources, sibling Actors, proxy groups, or third-party network resources.
Pricing
Recommended pricing model: Pay per usage. Under this model, users pay Apify platform resource costs for the run, and there is no custom developer charge from this Actor.
Pay-per-usage is less predictable before a run than a fixed per-record
quote. Start with a limited-scope trial, review the run cost and
OUTPUT counts, then scale only if the review density and platform
usage match your workflow.
This README does not configure Store pricing, claim a fixed per-record price, or promise custom Pay Per Event charging. Any event names that appear in internal design material are not buyer billing terms unless a live pricing surface says so.
Release History
See ./CHANGELOG.md for version-by-version release notes and migration guidance.