Pricing

Pay per usage

CRM Contact Cleanup & Dedupe Prep

Clean supplied URL, email, and address fields for contact records, preserving one row per input with changed-field, review, dedupe-key, and cross-field signals. Does not scrape, find, verify, enrich, geocode, score confidence, choose survivors, or merge contacts.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Critical Distinction

Actor stats

Bookmarked

Total users

Monthly active users

10 days ago

Last modified

What You Get

Preserves one dataset row per supplied input record.
Processes only the supported first-release fields: recordId, url, email, and address.
Normalizes URLs into canonical http or https values with stable host/path/query behavior and tracking-fragment cleanup.
Normalizes emails into stable lowercase and canonical mailbox/domain values, including deterministic Gmail alias handling.
Normalizes U.S.-leaning address text into display and comparison forms through the shared address-normalization boundary.
Emits required recordStatus values: ready, normalized, review_needed, invalid, and no_actionable_input.
Emits changedFields, reviewFlags, dedupeKeys, crossFieldSignals, warnings, invalidReasons, and processing details so buyers can see why each row landed where it did.
Writes a structured OUTPUT summary with selected controls, row counts, status distribution, diagnostic summaries, first-release limits, and explicit unsupported-capability booleans.

The output is meant to help a buyer decide what to review next. A clean row stays clean, a changed row explains what changed, a messy row keeps the original value next to diagnostics, and a possible same-run match is shown as a candidate label instead of being removed or merged.

Operating Boundaries

This README describes the product behavior of a single supplied-record cleanup run. The Actor writes one dataset row per input record and a structured OUTPUT summary; it does not configure recurrence, send alerts, replace workflows, call sibling Actors, change Store pricing, publish or unpublish itself, or mutate legacy Actor registrations.

Cost and billing are described in pricing. This README does not configure recurrence, live monitoring, workflow ownership, support response guarantees, or legacy Actor disposal authority.

Input Parameters

Parameter	Type	Default	Description
`records`	array of objects	required	Supplied contact-like records. Each record may include `recordId`, `url`, `email`, and `address`. The first-release limit is 1,000 records per run.
`fieldGroups`	array of strings	`["url", "email", "address"]`	Selects which supplied field families to process. Allowed values are `url`, `email`, and `address`. This is not a stage, child Actor, scraping, or live-enrichment control.
`reviewStrictness`	string	`standard`	Controls which deterministic review observations promote a row to `review_needed`. Allowed values are `minimal`, `standard`, and `strict`. It does not create confidence scoring, verification, survivor choice, or merge authority.
`dedupeKeyMode`	string	`keys_only`	Controls whether match-prep keys and same-run exact candidate labels are emitted. Allowed values are `off`, `keys_only`, and `keys_and_candidates`. It never removes rows, ranks records, chooses survivors, or merges contacts.

reviewStrictness changes routing pressure only. It does not hide changed fields, invalid reasons, warnings, dedupe keys, or cross-field signals. dedupeKeyMode changes whether match-prep evidence is shown; it does not change output cardinality or mutate any downstream system.

Example input:

{
    "records": [
        {
            "recordId": "acme-hq",
            "url": " Example.com/about/?utm_source=newsletter&b=2&a=1#team ",
            "email": "Sales+ops@Acme.com",
            "address": " 123 Main St., Suite 200, Austin, Texas 78701-1234 "
        },
        {
            "recordId": "partial-invalid",
            "url": "https://valid.example",
            "email": "not-an-email"
        }
    ],
    "fieldGroups": ["url", "email", "address"],
    "reviewStrictness": "standard",
    "dedupeKeyMode": "keys_only"
}

Output Format

This Actor writes two outputs:

The default dataset contains one cleanup row per supplied record.
The OUTPUT key-value record contains aggregate counts, selected controls, first-release limits, and explicit non-claim booleans.

Use the dataset when reviewing individual records. Use OUTPUT when you need run-level counts, selected controls, diagnostic totals, and a machine-readable reminder of unsupported capabilities.

Each dataset item includes:

{
    "recordId": "acme-hq",
    "inputIndex": 0,
    "recordStatus": "review_needed",
    "input": {
        "recordId": "acme-hq",
        "url": " Example.com/about/?utm_source=newsletter&b=2&a=1#team ",
        "email": "Sales+ops@Acme.com",
        "address": " 123 Main St., Suite 200, Austin, Texas 78701-1234 "
    },
    "normalized": {
        "url": {
            "canonicalUrl": "https://example.com/about?a=1&b=2",
            "scheme": "https",
            "host": "example.com",
            "path": "/about"
        },
        "email": {
            "normalizedEmail": "sales+ops@acme.com",
            "canonicalEmail": "sales+ops@acme.com",
            "domain": "acme.com"
        },
        "address": {
            "normalizedAddress": "123 main street ste 200 austin tx 78701",
            "comparisonAddress": "123 main street austin tx 78701",
            "addressType": "street",
            "state": "tx",
            "postalCode": "78701",
            "postalCodeExtension": "1234",
            "secondaryUnitDesignator": "ste",
            "secondaryUnitIdentifier": "200"
        }
    },
    "changedFields": [
        {
            "fieldGroup": "url",
            "sourceField": "input.url",
            "targetField": "normalized.url.canonicalUrl",
            "originalValue": " Example.com/about/?utm_source=newsletter&b=2&a=1#team ",
            "normalizedValue": "https://example.com/about?a=1&b=2",
            "reasonCodes": [
                "trimmed_whitespace",
                "assumed_https",
                "removed_fragment",
                "removed_tracking_parameters",
                "sorted_query_parameters",
                "removed_trailing_slash"
            ]
        }
    ],
    "reviewFlags": [
        {
            "fieldGroup": "email",
            "sourceField": "derived.email.roleAccount",
            "sourceValue": true,
            "flagCode": "email_role_account",
            "severity": "medium",
            "strictnessThreshold": "standard",
            "message": "Email local part looks like a role or team mailbox."
        }
    ],
    "dedupeKeys": [
        {
            "fieldGroup": "email",
            "keyFamily": "email_domain",
            "keyValue": "acme.com",
            "matchScope": "organization",
            "keyStrength": "context",
            "sourceFields": ["normalized.email.domain"],
            "sourceValues": ["acme.com"],
            "candidate": null
        }
    ],
    "crossFieldSignals": [
        {
            "signalCode": "url_email_domain_mismatch",
            "fieldGroups": ["url", "email"],
            "severity": "medium",
            "sourceFields": ["normalized.url.host", "normalized.email.domain"],
            "sourceValues": ["example.com", "acme.com"],
            "reviewStrictnessThreshold": "standard",
            "statusImpact": "promotes_review_needed",
            "message": "Website host and email domain do not line up under the deterministic domain comparison rule; review before treating them as the same organization context."
        }
    ],
    "warnings": [],
    "invalidReasons": [],
    "processing": {
        "enabledFieldGroups": ["url", "email", "address"],
        "reviewStrictness": "standard",
        "dedupeKeyMode": "keys_only",
        "fieldStates": {
            "url": {
                "enabled": true,
                "inputState": "nonblank",
                "resultState": "usable"
            },
            "email": {
                "enabled": true,
                "inputState": "nonblank",
                "resultState": "usable"
            },
            "address": {
                "enabled": true,
                "inputState": "nonblank",
                "resultState": "usable"
            }
        }
    }
}

The OUTPUT summary includes:

{
    "schemaVersion": "contact-cleanup-output-v1",
    "selectedControls": {
        "fieldGroups": ["url", "email", "address"],
        "reviewStrictness": "standard",
        "dedupeKeyMode": "keys_only"
    },
    "inputCount": 2,
    "emittedCount": 2,
    "emittedEqualsInputCount": true,
    "statusCounts": {
        "ready": 0,
        "normalized": 0,
        "review_needed": 2,
        "invalid": 0,
        "no_actionable_input": 0
    },
    "rowOutcomeCounts": {
        "rowsWithUsableOutput": 2,
        "rowsWithChangedFields": 2,
        "rowsWithReviewFlags": 2,
        "rowsWithDedupeKeys": 2,
        "rowsWithDuplicateCandidates": 0,
        "rowsWithCrossFieldSignals": 1,
        "rowsWithWarnings": 0,
        "rowsWithInvalidReasons": 1
    },
    "firstReleaseLimits": {
        "recordsMaxItems": 1000,
        "supportedInputFields": ["recordId", "url", "email", "address"],
        "supportedFieldGroups": ["url", "email", "address"],
        "defaultFieldGroups": ["url", "email", "address"],
        "reviewStrictnessValues": ["minimal", "standard", "strict"],
        "defaultReviewStrictness": "standard",
        "dedupeKeyModeValues": ["off", "keys_only", "keys_and_candidates"],
        "defaultDedupeKeyMode": "keys_only",
        "cleanupProfileExposed": false,
        "defaultOutputCardinality": "one_row_per_input_record"
    },
    "nonClaimSummary": {
        "emailDeliverabilityVerified": false,
        "inboxExistenceVerified": false,
        "inboxOwnershipVerified": false,
        "missingContactsFound": false,
        "sourceScrapingPerformed": false,
        "externalEnrichmentPerformed": false,
        "websiteReachabilityChecked": false,
        "postalDeliverabilityCertified": false,
        "geocodingPerformed": false,
        "demographicEnrichmentPerformed": false,
        "confidenceScoringPerformed": false,
        "crmMergePerformed": false,
        "automaticSurvivorshipSelected": false,
        "liveFreshnessChecked": false
    },
    "runDurationSeconds": 0.0
}

The real OUTPUT object also includes detailed field-group activity, changed-field, review-flag, dedupe-key, cross-field-signal, warning, and invalid-reason summaries.

Row Statuses

Status	Meaning	How to use it
`ready`	Supported supplied values were already usable under the deterministic rules.	Treat as cleanup confirmation, not live verification.
`normalized`	At least one supplied value changed into a usable normalized value without review pressure.	Review `changedFields` if you need to explain the transformation.
`review_needed`	The row has usable output plus review pressure such as a partial invalid field, role/disposable email observation, duplicate candidate, or cross-field signal.	Review before matching, importing, or trusting the row in another system.
`invalid`	Enabled nonblank supplied values could not be normalized into usable output.	Use `invalidReasons` as diagnostics; the row is preserved on purpose.
`no_actionable_input`	The row had no enabled nonblank `url`, `email`, or `address` value to process.	Keep or remove it according to your own source-system rules.

Key Output Fields

Field	What it tells you	Boundary
`normalized`	Canonical URL, canonical email/domain, and U.S.-leaning address display/comparison values when available.	Deterministic cleanup only; no reachability, deliverability, ownership, postal, or geocoding proof.
`changedFields`	Which supported values changed and which reason codes explain the change.	Explains transformations; it is not a correctness score.
`reviewFlags`	Deterministic observations that can make a row worth human review.	Review routing only; not a confidence score or truth verdict.
`dedupeKeys`	URL, email, domain, and address comparison keys, plus optional same-run candidate labels.	Match preparation only; no duplicate removal, ranking, survivor choice, or CRM merge.
`crossFieldSignals`	Deterministic prompts from relationships between supplied fields, such as URL/email domain mismatch.	Review prompts only; not identity, ownership, fraud, or legal proof.
`warnings`	Non-blocking diagnostics such as unknown address shape or disabled field group context.	Keeps uncertainty visible without claiming the row is wrong.
`invalidReasons`	Field-level reasons why supplied nonblank values could not be normalized.	Diagnostics only; invalid output is not a failed run by itself.
`processing`	Controls and per-field input/result states used for the row.	Helps explain routing; not a pricing or support guarantee.

Deterministic Smoke Behavior

The committed smoke input at .actor/smoke_input.json uses only deterministic .example records. It covers ready, normalized, no-actionable, invalid, mixed valid/invalid, duplicate-candidate, cross-field, review, same-address, and warning-shaped rows without depending on third-party network state.

Fixture-Backed Output Examples

The repo includes a detailed ./docs/output-examples.md for the committed 12-record smoke fixture. That pack traces examples to the smoke input, aggregate contract test, and saved smoke/cost-matrix run summaries.

The fixture summary is:

Example surface	Count
Input records	12
Dataset rows emitted	12
`ready` rows	2
`normalized` rows	1
`review_needed` rows	7
`invalid` rows	1
`no_actionable_input` rows	1
Rows with duplicate candidates	4
Rows with cross-field signals	3
Rows with warnings	1
Rows with invalid reasons	2

Use the examples as support guidance for reading output. They show how duplicate candidates, domain mismatch signals, invalid diagnostics, no-actionable rows, and address warnings preserve review evidence without removing rows or claiming verification, enrichment, address authority, confidence scoring, automatic dedupe, or CRM merge.

How To Read Common Rows

Ready or normalized rows Use the normalized values and changed-field ledger as cleanup evidence. Do not treat the row as proof that a website is reachable, an inbox exists, an address is deliverable, or a contact is current.

Invalid rows Invalid rows are emitted intentionally when supplied nonblank values cannot be normalized. The row keeps the original input and explains the problem in invalidReasons so you can fix the source record or route it for manual review.

Partial rows A row can contain usable output for one field group and invalid diagnostics for another. Keep reading the full row before discarding it; the useful field groups remain available.

No-actionable rows Blank, null, missing, or disabled field groups can produce no_actionable_input. These rows preserve input cardinality. Because live billing is tied to default dataset rows, no-actionable rows count as processed rows when they are emitted. Remove blank source records before a run if you do not want them included in the row count.

Duplicate-candidate rows When dedupeKeyMode is keys_and_candidates, rows can point at same-run peers with the same canonical URL, canonical email, or address comparison key. That is a review queue, not an automatic dedupe result.

Cross-field signal rows Signals such as URL/email domain mismatch or same-address context are deterministic prompts from supplied values. They should guide review, not be used as verified identity, ownership, or fraud findings.

Warning rows Warnings keep uncertain context visible, such as an address shape that could not be confidently parsed into common components. A warning can coexist with usable output.

Example Use Cases

CRM intake review Normalize supplied website, email, and address values while preserving invalid input and row-level review pressure for human triage.

Lead-list cleanup before matching Create stable URL, mailbox, domain, and address comparison keys before joining records against another system.

Dedupe preparation without merge authority Emit deterministic keys and same-run exact candidate labels while keeping every original row and avoiding automatic survivor selection.

Limitations

This Actor only processes supplied records. It does not scrape websites, crawl pages, find missing contacts, or fetch external enrichment from providers.
URL cleanup does not prove website reachability, safety, ownership, live freshness, redirect equivalence, or page content.
Email cleanup does not verify deliverability, inbox existence, inbox ownership, mailbox ownership, or sender compliance.
Address cleanup is U.S.-leaning heuristic normalization and comparison-key preparation. It does not certify postal deliverability, geocode addresses, add demographics, or prove that an address belongs to a contact.
Dedupe keys and same-run candidate labels are preparation signals. They do not cluster records, rank matches, choose survivors, remove duplicates, merge CRM records, or mutate downstream systems.
recordStatus is deterministic routing evidence, not a confidence score, CRM truth verdict, legal identity claim, pricing signal, or support guarantee.
High review_needed counts can reflect messy supplied input or strict review settings. They do not mean the run failed or that the Actor verified those rows as bad.
No-actionable rows preserve cardinality. They are useful for audit trails, and they count as processed rows because the Actor writes them to the default dataset.
Phone, company-name cleanup, CSV/CRM import, arbitrary metadata, fuzzy scoring, live verification, enrichment, and automatic merge controls are outside the first-release contract.

Disclaimer

This Actor performs deterministic cleanup and match-preparation only. Use its output as structured evidence for review, matching, or downstream workflow decisions, not as proof that a contact is current, reachable, deliverable, enriched, owned, merged, or CRM-true.

Permissions

This Actor is designed to run with limited permissions. It writes only to its default dataset and default key-value store. The current runtime does not require access to other Apify storages, account resources, sibling Actors, proxy groups, or third-party network resources.

Pricing

Pricing change scheduled: Pay Per Event at $0.10 per 1,000 processed rows, effective June 3, 2026 at 18:20 UTC. Until that Apify notice window ends, the Store may continue to show the previous free pricing.

Apify will implement the scheduled price with the synthetic apify-default-dataset-item event. Each cleanup row written to the default dataset will be one billable event at $0.0001. The Actor writes one dataset row per supplied input record, so a 1,000-record run will be priced at $0.10 before any account-level taxes, credits, or Apify billing adjustments after the scheduled effective time.

Platform usage is included in the scheduled event price. The structured OUTPUT summary is not billed separately. Invalid, review-needed, partial, and no-actionable rows are still emitted rows and count toward the row total because preserving row cardinality is part of the product contract.

Release History

See ./CHANGELOG.md for version-by-version release notes and migration guidance.

Business Address Scraper

maximedupre/business-address-scraper

Find physical business addresses from company websites and export one row per address with source URL, evidence text, parsed fields, and confidence.

Maxime Dupré

GTM Leads Cleaner

yummy_gelato/gtm-leads-cleaner

Upload any lead CSV and get a CRM-ready dataset: email validation, name/company cleanup, job-title bucketing, and dedupe by email or domain+name.

Howard

Instagram Email Scraper

scrapelabsapi/instagram-email-scraper

📧 Instagram Email Scraper pulls public business emails from profiles, followers & hashtags. 🔍 Bulk scrape, verify & dedupe; filter by niche; enrich with username, bio, site & phone. 📊 Export CSV/JSON. 🚀 Ideal for lead gen & influencer outreach.

ScrapeLabs

Enrich-CRM — Find Email

enrich-crm/enrich-crm-find-email

Find a professional email address from a name + company domain or LinkedIn URL.

Enrich-CRM

Enrich Contact

enrich-crm/enrich-crm-enrich-contact

Enrich a contact with 50+ fields: job title, seniority, department, LinkedIn URL, company data, and more. Requires email, LinkedIn URL, or name + company.

Enrich-CRM

Enrich-CRM — Reverse Email

enrich-crm/enrich-crm-reverse-email

Retrieve full contact and company profile from any professional email address.

Enrich-CRM

All In One Social Media Email Scraper

scrapio/all-in-one-social-media-email-scraper

🔎 All In One Social Media Email Scraper finds publicly listed emails across LinkedIn, X, Instagram, Facebook, YouTube & more. 📧 Extract contacts from profiles & bios, dedupe & export to CSV/CRM. 🚀 Perfect for lead gen, prospecting & outreach.

Scrapio

Find the Best Contact on Any Company Website

happyfhantum/lead-finder-verified-contacts

Return one clean best-contact row per company website with email, phone, LinkedIn, and form data.

Kelsey Todd

All In One Social Media Email Scraper

scrapier/all-in-one-social-media-email-scraper

📧 All In One Social Media Email Scraper extracts verified emails & contact info from public LinkedIn, Instagram, X (Twitter), Facebook, TikTok & YouTube profiles. 🧠 Smart filters, dedupe & validation. 🚀 Build B2B lead lists, export to CSV/CRM for outreach, sales, recruiting.