CRM Contact Cleanup & Dedupe Prep
Pricing
$0.10 / 1,000 processed rows
CRM Contact Cleanup & Dedupe Prep
Clean supplied URL, email, and address fields for contact records, preserving one row per input with changed-field, review, dedupe-key, and cross-field signals. Does not scrape, find, verify, enrich, geocode, score confidence, choose survivors, or merge contacts.
Pricing
$0.10 / 1,000 processed rows
Rating
0.0
(0)
Developer
Critical Distinction
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
15 days ago
Last modified
Categories
Share
Prepare contact records you already have for CRM review, lead-list quality checks, matching, or downstream cleanup. This Actor accepts small inline record sets or high-volume source routes, keeps one dataset row per accepted contact record, normalizes supplied URL, email, and U.S.-leaning address values, and returns the changed-field evidence, review flags, invalid reasons, warnings, and match keys next to the original row.
Use it when you need a deterministic cleanup pass before human review or downstream matching decisions:
- Normalize supplied website, email, and address fields without live data sourcing.
- Keep invalid, partial, and no-actionable rows visible instead of dropping them from the output.
- Preview CSV, JSONL, JSON-array, Dataset, or key-value-store sources before execution, with row counts and billing estimates but zero default dataset rows.
- Prepare dedupe keys and same-run candidate labels while preserving every input row.
- Route messy records to review with diagnostics, not with a confidence score or final truth verdict.
It does not scrape or find contacts, verify email deliverability, certify postal addresses, geocode, enrich records, remove duplicates, choose survivors, or merge CRM records.
Details below cover what you get, input parameters, input routes and source examples, output format, fixture-backed output examples, integration recipes, how to read common rows, limitations, permissions, pricing, and release history.
What You Get
- Preserves one dataset row per supplied inline record or accepted source record.
- Processes only the supported first-release fields:
recordId,url,email, andaddress. - Keeps inline
recordscapped at 1,000 rows while source routes can preflight and execute up to 10,000 accepted source records. - Supports source input from file upload strings backed by Apify KVS records, Apify Dataset rows, and selected key-value-store records.
- Parses CSV with a header row, JSONL object lines, and JSON arrays of objects for file and key-value-store routes.
- Normalizes URLs into canonical
httporhttpsvalues with stable host/path/query behavior and tracking-fragment cleanup. - Normalizes emails into stable lowercase and canonical mailbox/domain values, including deterministic Gmail alias handling.
- Normalizes U.S.-leaning address text into display and comparison
forms through the shared
address-normalizationboundary. - Emits required
recordStatusvalues:ready,normalized,review_needed,invalid, andno_actionable_input. - Emits
changedFields,reviewFlags,dedupeKeys,crossFieldSignals,warnings,invalidReasons, andprocessingdetails so buyers can see why each row landed where it did. - Writes a structured
OUTPUTv2 summary with selected controls, ingestion diagnostics, billing preview, row counts, status distribution, diagnostic summaries, first-release limits, and explicit unsupported-capability booleans.
The output is meant to help a buyer decide what to review next. A clean row stays clean, a changed row explains what changed, a messy row keeps the original value next to diagnostics, and a possible same-run match is shown as a candidate label instead of being removed or merged.
Operating Boundaries
This README describes the product behavior of supplied-record cleanup
runs, whether records arrive inline or through a supported source
route. The Actor writes one dataset row per supplied inline record or
accepted source record and a structured OUTPUT summary; it does not
configure recurrence, send alerts, replace workflows, call sibling
Actors, change Store pricing, publish or unpublish itself, or mutate
legacy Actor registrations.
Cost and billing are described in pricing. This README does not configure recurrence, live monitoring, workflow ownership, support response guarantees, or legacy Actor disposal authority.
Agentic And API Use
Contact Cleanup is suitable for agents and API consumers that need a
bounded cleanup step for records they already control. Public Store
readback on June 4, 2026 returned critd/contact-cleanup in the
allowsAgenticUsers=true filter with
isWhiteListedForAgenticPayments = true and a Pay Per Event row for
apify-default-dataset-item. Treat that as Store discovery and
payment-eligibility readback only; it is not support readiness, a
result-quality guarantee, live high-volume run proof, or a
charged-event counter.
For source-route work, agents should run previewOnly = true first and
read the OUTPUT summary before execution:
OUTPUT.ingestionshows the route, row counts, and mapping diagnostics without raw source values.OUTPUT.billingPreviewestimates default-dataset-row events and spend-cap posture without calling the custom charge API.OUTPUT.sourcePreviewadds route-specific parser and preflight diagnostics for file, Dataset, and key-value-store sources.
Execution should use a buyer-owned spend cap when appropriate. The Actor still performs deterministic cleanup only; it does not verify, enrich, score, scrape, choose survivors, or merge contacts.
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
records | array of objects | required when source is absent | Inline supplied contact-like records. Use this route for smaller jobs. Each record may include recordId, url, email, and address. The inline first-release limit remains 1,000 records per run. |
source | object | required when records is absent | High-volume source route. Choose exactly one route: file, dataset, or keyValueStore. Source routes reject mixed inline records, route-field mismatches, unsupported source fields, and arbitrary CRM passthrough columns. |
previewOnly | boolean | false | Source-route control. When true, the Actor reads and diagnoses the selected source, writes OUTPUT diagnostics, and writes zero default dataset rows. |
sourceMaxRows | integer | 10000 | Maximum accepted source records for source routes. Sources over the cap fail before output; silent truncation is not supported. |
rowLimitBehavior | string | fail_over_limit | First-release source row-limit behavior. Over-limit sources fail before default dataset rows or OUTPUT. |
fieldGroups | array of strings | ["url", "email", "address"] | Selects which supplied field families to process. Allowed values are url, email, and address. This is not a stage, child Actor, scraping, or live-enrichment control. |
reviewStrictness | string | standard | Controls which deterministic review observations promote a row to review_needed. Allowed values are minimal, standard, and strict. It does not create confidence scoring, verification, survivor choice, or merge authority. |
dedupeKeyMode | string | keys_only | Controls whether match-prep keys and same-run exact candidate labels are emitted. Allowed values are off, keys_only, and keys_and_candidates. It never removes rows, ranks records, chooses survivors, or merges contacts. |
reviewStrictness changes routing pressure only. It does not hide
changed fields, invalid reasons, warnings, dedupe keys, or cross-field
signals. dedupeKeyMode changes whether match-prep evidence is shown;
it does not change output cardinality or mutate any downstream system.
Example input:
{"records": [{"recordId": "acme-hq","url": " Example.com/about/?utm_source=newsletter&b=2&a=1#team ","email": "Sales+ops@Acme.com","address": " 123 Main St., Suite 200, Austin, Texas 78701-1234 "},{"recordId": "partial-invalid","url": "https://valid.example","email": "not-an-email"}],"fieldGroups": ["url", "email", "address"],"reviewStrictness": "standard","dedupeKeyMode": "keys_only"}
Input Routes And Source Examples
Choose exactly one input route per run:
| Route | Use it for | Output behavior |
|---|---|---|
Inline records | Up to 1,000 supplied records pasted into the input. | One default dataset row per input record. |
source.type = "file" | A file-upload value that resolves to an Apify KVS record URL, apify://key-value-stores/<store>/records/<key>, or kvs://<store>/<key>. | Preview or execute CSV, JSONL, or JSON-array source bodies after parser and row-policy preflight. |
source.type = "dataset" | A selected Apify Dataset ID or unique name that contains object rows. | Reads selected mapped fields with paging and offset support, then previews or executes accepted records. Dataset metadata counts are advisory. |
source.type = "keyValueStore" | A selected KVS record containing CSV, JSONL, or a JSON array. | Reads one selected text record with byte limits and format guardrails, then previews or executes accepted records. |
Source rows only map to recordId, url, email, and address.
Ignored fields are counted but not preserved in dataset rows or
OUTPUT. Physical blank source rows are skipped before accepted-record
counting. Accepted records with no enabled nonblank supported values
still emit no_actionable_input rows during execution.
File Preview
Use preview first when testing a buyer file mapping. This writes a v2
OUTPUT summary with ingestion, billingPreview, and
sourcePreview, but zero default dataset rows.
{"source": {"type": "file","file": "kvs://source-store/contacts.csv","format": "csv","fieldMap": {"recordId": "id","url": "website","email": "email","address": "mailing_address"}},"previewOnly": true,"sourceMaxRows": 10,"rowLimitBehavior": "fail_over_limit","fieldGroups": ["url", "email", "address"],"reviewStrictness": "standard","dedupeKeyMode": "keys_and_candidates"}
Fixture-backed behavior for the committed CSV source fixture:
| Preview metric | Value |
|---|---|
| Physical source rows seen | 4 |
| Physical blank rows skipped | 1 |
| Accepted records | 3 |
| Accepted actionable records | 2 |
| No-actionable mapped records | 1 |
| Default dataset rows written | 0 |
File Execution
Run the same file route with previewOnly omitted or false after the
preview looks right. Execution runs the same preflight and spend-cap
checks, then emits one default dataset row per accepted source record
and writes OUTPUT after dataset rows.
{"source": {"type": "file","file": "kvs://source-store/contacts.csv","format": "csv","fieldMap": {"recordId": "id","url": "website","email": "email","address": "mailing_address"}},"sourceMaxRows": 10,"dedupeKeyMode": "keys_and_candidates"}
For the committed CSV fixture, execution emits rows for alpha-1,
empty-contact, and beta-2; the physical blank row is skipped, and
empty-contact is preserved as a no_actionable_input row.
Dataset Source
Use a Dataset route when another Actor or task has already produced object rows in Apify storage. The Actor reads only the mapped fields it needs and records the requested offset plus advisory metadata counts in source diagnostics.
{"source": {"type": "dataset","datasetId": "source-dataset","offset": 0,"fieldMap": {"recordId": "contact_id","url": "website","email": "email_address","address": "street"}},"previewOnly": true,"sourceMaxRows": 300}
Key-Value-Store Source
Use a key-value-store route when a workflow stores a CSV, JSONL, or JSON array under a known KVS key.
{"source": {"type": "keyValueStore","storeId": "source-store","recordKey": "contacts.jsonl","format": "jsonl"},"previewOnly": true,"sourceMaxRows": 10000}
format may be csv, jsonl, json_array, or auto for file and
KVS routes. Dataset routes already provide object rows and should leave
format empty.
Source Preflight And Spend Caps
Malformed source structure, row-cap failures, unsupported formats,
over-large KVS bodies, and execution spend-cap failures stop before
default dataset rows, OUTPUT, or custom charge calls. Preview remains
available even when the estimated charge would exceed
ACTOR_MAX_TOTAL_CHARGE_USD; in that case billingPreview reports
spendCapStatus = "would_exceed_cap" and
spendCapEnforced = false because preview writes zero default dataset
rows.
Execution enforces the spend cap before output. If three accepted source
records estimate three emitted default dataset rows at $0.0001 each
and ACTOR_MAX_TOTAL_CHARGE_USD = 0.0002, execution fails before
dataset rows or OUTPUT.
Output Format
This Actor writes two outputs:
- The default dataset contains one cleanup row per supplied inline record or accepted source record.
- The
OUTPUTkey-value record contains aggregate counts, selected controls, source ingestion diagnostics, billing preview, first-release limits, and explicit non-claim booleans.
Use the dataset when reviewing individual records. Use OUTPUT when
you need run-level counts, selected controls, diagnostic totals, and a
machine-readable reminder of unsupported capabilities.
Each dataset item includes:
{"recordId": "acme-hq","inputIndex": 0,"recordStatus": "review_needed","input": {"recordId": "acme-hq","url": " Example.com/about/?utm_source=newsletter&b=2&a=1#team ","email": "Sales+ops@Acme.com","address": " 123 Main St., Suite 200, Austin, Texas 78701-1234 "},"normalized": {"url": {"canonicalUrl": "https://example.com/about?a=1&b=2","scheme": "https","host": "example.com","path": "/about"},"email": {"normalizedEmail": "sales+ops@acme.com","canonicalEmail": "sales+ops@acme.com","domain": "acme.com"},"address": {"normalizedAddress": "123 main street ste 200 austin tx 78701","comparisonAddress": "123 main street austin tx 78701","addressType": "street","state": "tx","postalCode": "78701","postalCodeExtension": "1234","secondaryUnitDesignator": "ste","secondaryUnitIdentifier": "200"}},"changedFields": [{"fieldGroup": "url","sourceField": "input.url","targetField": "normalized.url.canonicalUrl","originalValue": " Example.com/about/?utm_source=newsletter&b=2&a=1#team ","normalizedValue": "https://example.com/about?a=1&b=2","reasonCodes": ["trimmed_whitespace","assumed_https","removed_fragment","removed_tracking_parameters","sorted_query_parameters","removed_trailing_slash"]}],"reviewFlags": [{"fieldGroup": "email","sourceField": "derived.email.roleAccount","sourceValue": true,"flagCode": "email_role_account","severity": "medium","strictnessThreshold": "standard","message": "Email local part looks like a role or team mailbox."}],"dedupeKeys": [{"fieldGroup": "email","keyFamily": "email_domain","keyValue": "acme.com","matchScope": "organization","keyStrength": "context","sourceFields": ["normalized.email.domain"],"sourceValues": ["acme.com"],"candidate": null}],"crossFieldSignals": [{"signalCode": "url_email_domain_mismatch","fieldGroups": ["url", "email"],"severity": "medium","sourceFields": ["normalized.url.host", "normalized.email.domain"],"sourceValues": ["example.com", "acme.com"],"reviewStrictnessThreshold": "standard","statusImpact": "promotes_review_needed","message": "Website host and email domain do not line up under the deterministic domain comparison rule; review before treating them as the same organization context."}],"warnings": [],"invalidReasons": [],"processing": {"enabledFieldGroups": ["url", "email", "address"],"reviewStrictness": "standard","dedupeKeyMode": "keys_only","fieldStates": {"url": {"enabled": true,"inputState": "nonblank","resultState": "usable"},"email": {"enabled": true,"inputState": "nonblank","resultState": "usable"},"address": {"enabled": true,"inputState": "nonblank","resultState": "usable"}}}}
Source-route execution rows also include bounded sourceMetadata. The
metadata is row identity for audit and debugging, not CRM-column
passthrough:
{"sourceMetadata": {"sourceType": "keyValueStore","sourceFormat": "csv","acceptedRecordIndex": 0,"sourceRowIndex": 2,"mappedFields": ["recordId", "url", "email", "address"],"ignoredFieldCount": 2}}
The OUTPUT summary includes:
{"schemaVersion": "contact-cleanup-output-v2","selectedControls": {"fieldGroups": ["url", "email", "address"],"reviewStrictness": "standard","dedupeKeyMode": "keys_only"},"ingestion": {"inputRoute": "inline","previewOnly": false,"sourceType": null,"sourceFormat": null,"sourceMaxRows": null,"rowLimitBehavior": null,"defaultDatasetRowsWritten": null,"rowCounts": null,"mappingDiagnostics": null},"billingPreview": {"eventName": "apify-default-dataset-item","eventUnit": "default_dataset_item","eventPriceUsd": 0.0001,"syntheticDefaultDatasetItemEvent": true,"customChargeApiCalled": false,"previewOnly": false,"acceptedRecordCount": 2,"estimatedEmittedRows": 2,"estimatedEventCount": 2,"estimatedChargeUsd": 0.0002,"spendCapStatus": "not_configured","spendCapEnforced": false,"pricingEffectiveAtUtc": "2026-06-03T18:20:34.542Z","pricingProofState": "pre_effective_or_current_unproved","currentPpeProved": false,"currentPpeProofRequired": true},"inputCount": 2,"emittedCount": 2,"emittedEqualsInputCount": true,"statusCounts": {"ready": 0,"normalized": 0,"review_needed": 2,"invalid": 0,"no_actionable_input": 0},"rowOutcomeCounts": {"rowsWithUsableOutput": 2,"rowsWithChangedFields": 2,"rowsWithReviewFlags": 2,"rowsWithDedupeKeys": 2,"rowsWithDuplicateCandidates": 0,"rowsWithCrossFieldSignals": 1,"rowsWithWarnings": 0,"rowsWithInvalidReasons": 1},"firstReleaseLimits": {"recordsMaxItems": 1000,"supportedInputFields": ["recordId", "url", "email", "address"],"supportedFieldGroups": ["url", "email", "address"],"defaultFieldGroups": ["url", "email", "address"],"reviewStrictnessValues": ["minimal", "standard", "strict"],"defaultReviewStrictness": "standard","dedupeKeyModeValues": ["off", "keys_only", "keys_and_candidates"],"defaultDedupeKeyMode": "keys_only","cleanupProfileExposed": false,"defaultOutputCardinality": "one_row_per_input_record"},"nonClaimSummary": {"emailDeliverabilityVerified": false,"inboxExistenceVerified": false,"inboxOwnershipVerified": false,"missingContactsFound": false,"sourceScrapingPerformed": false,"externalEnrichmentPerformed": false,"websiteReachabilityChecked": false,"postalDeliverabilityCertified": false,"geocodingPerformed": false,"demographicEnrichmentPerformed": false,"confidenceScoringPerformed": false,"crmMergePerformed": false,"automaticSurvivorshipSelected": false,"liveFreshnessChecked": false},"runDurationSeconds": 0.0}
The real OUTPUT object also includes detailed field-group activity,
changed-field, review-flag, dedupe-key, cross-field-signal, warning,
and invalid-reason summaries. Source runs also include sourcePreview
with route-specific preview or execution diagnostics; top-level
ingestion and billingPreview are the stable machine-readable fields
to consume first.
Row Statuses
| Status | Meaning | How to use it |
|---|---|---|
ready | Supported supplied values were already usable under the deterministic rules. | Treat as cleanup confirmation, not live verification. |
normalized | At least one supplied value changed into a usable normalized value without review pressure. | Review changedFields if you need to explain the transformation. |
review_needed | The row has usable output plus review pressure such as a partial invalid field, role/disposable email observation, duplicate candidate, or cross-field signal. | Review before matching, importing, or trusting the row in another system. |
invalid | Enabled nonblank supplied values could not be normalized into usable output. | Use invalidReasons as diagnostics; the row is preserved on purpose. |
no_actionable_input | The row had no enabled nonblank url, email, or address value to process. | Keep or remove it according to your own source-system rules. |
Key Output Fields
| Field | What it tells you | Boundary |
|---|---|---|
normalized | Canonical URL, canonical email/domain, and U.S.-leaning address display/comparison values when available. | Deterministic cleanup only; no reachability, deliverability, ownership, postal, or geocoding proof. |
changedFields | Which supported values changed and which reason codes explain the change. | Explains transformations; it is not a correctness score. |
reviewFlags | Deterministic observations that can make a row worth human review. | Review routing only; not a confidence score or truth verdict. |
dedupeKeys | URL, email, domain, and address comparison keys, plus optional same-run candidate labels. | Match preparation only; no duplicate removal, ranking, survivor choice, or CRM merge. |
crossFieldSignals | Deterministic prompts from relationships between supplied fields, such as URL/email domain mismatch. | Review prompts only; not identity, ownership, fraud, or legal proof. |
warnings | Non-blocking diagnostics such as unknown address shape or disabled field group context. | Keeps uncertainty visible without claiming the row is wrong. |
invalidReasons | Field-level reasons why supplied nonblank values could not be normalized. | Diagnostics only; invalid output is not a failed run by itself. |
processing | Controls and per-field input/result states used for the row. | Helps explain routing; not a pricing or support guarantee. |
Deterministic Smoke Behavior
The committed smoke input at .actor/smoke_input.json uses only
deterministic .example records. It covers ready, normalized,
no-actionable, invalid, mixed valid/invalid, duplicate-candidate,
cross-field, review, same-address, and warning-shaped rows without
depending on third-party network state.
High-volume source-route behavior is covered by committed local
fixtures under tests/fixtures/ and by the high_volume_ingestion
integration tests. Those fixtures use synthetic source values and
exercise file, Dataset, KVS, parser, preflight, spend-cap, source
metadata, and keys_and_candidates behavior without live storage
proof.
Fixture-Backed Output Examples
The repo includes a detailed ./docs/output-examples.md for the committed 12-record smoke fixture. That pack traces examples to the smoke input, aggregate contract test, and saved smoke/cost-matrix run summaries.
The fixture summary is:
| Example surface | Count |
|---|---|
| Input records | 12 |
| Dataset rows emitted | 12 |
ready rows | 2 |
normalized rows | 1 |
review_needed rows | 7 |
invalid rows | 1 |
no_actionable_input rows | 1 |
| Rows with duplicate candidates | 4 |
| Rows with cross-field signals | 3 |
| Rows with warnings | 1 |
| Rows with invalid reasons | 2 |
The source-route fixture summary is:
| Example surface | Count |
|---|---|
| Physical CSV source rows seen | 4 |
| Physical blank source rows skipped | 1 |
| Accepted source records | 3 |
| Accepted actionable source records | 2 |
| Accepted no-actionable source records | 1 |
| Ignored source fields counted | 9 |
| Preview default dataset rows | 0 |
| Execution default dataset rows | 3 |
| Custom charge API calls | 0 |
Use the examples as support guidance for reading output. They show how duplicate candidates, domain mismatch signals, invalid diagnostics, no-actionable rows, and address warnings preserve review evidence without removing rows or claiming verification, enrichment, address authority, confidence scoring, automatic dedupe, or CRM merge.
Integration Recipes
Preview a buyer CSV before billing
Upload or save a CSV in an Apify key-value store, map source columns to
recordId, url, email, and address, and run with
previewOnly = true. Review OUTPUT.ingestion.rowCounts,
OUTPUT.sourcePreview.mappingDiagnostics, and
OUTPUT.billingPreview.estimatedChargeUsd. No default dataset rows are
written in preview.
Execute the same CSV after preview
Run the same input with previewOnly omitted or false. The Actor
reuses parser, row-cap, and spend-cap preflight, then writes one default
dataset row per accepted source record. OUTPUT is written after
dataset rows, so downstream jobs can read individual rows first and the
run summary second.
Chain from an upstream Apify Dataset
Use source.type = "dataset" with a Dataset ID or unique name and a
field map. This route reads selected mapped fields, pages rows with an
offset, and treats Dataset metadata counts as advisory. It does not
claim the upstream Actor produced verified contact data.
Pass a saved JSONL or JSON-array record through KVS
Use source.type = "keyValueStore" with storeId, recordKey, and
format. This is useful when another system has already written a
bounded text record to Apify storage. The Actor reads one selected
record and does not crawl arbitrary URLs, authenticate to CRMs, or
preserve unsupported columns.
Use keys_and_candidates for same-run review
Set dedupeKeyMode = "keys_and_candidates" when source rows should
carry exact same-run candidate labels. The high-volume route preserves
peer context through its source spool. Candidate labels remain review
signals; they are not duplicate removal, ranking, survivor choice, or
CRM merge.
How To Read Common Rows
Ready or normalized rows Use the normalized values and changed-field ledger as cleanup evidence. Do not treat the row as proof that a website is reachable, an inbox exists, an address is deliverable, or a contact is current.
Invalid rows
Invalid rows are emitted intentionally when supplied nonblank values
cannot be normalized. The row keeps the original input and explains the
problem in invalidReasons so you can fix the source record or route
it for manual review.
Partial rows A row can contain usable output for one field group and invalid diagnostics for another. Keep reading the full row before discarding it; the useful field groups remain available.
No-actionable rows
Blank, null, missing, or disabled field groups can produce
no_actionable_input. These rows preserve input cardinality. Because
live billing is tied to default dataset rows, no-actionable rows count
as processed rows when they are emitted. Remove blank source records
before a run if you do not want them included in the row count.
Physical blank source rows are different. A fully blank CSV row or blank
JSONL line is skipped before accepted-record counting and is not emitted
as a dataset row. An accepted source row such as a row with only
recordId but no enabled nonblank url, email, or address is a
no_actionable_input emitted row.
Duplicate-candidate rows
When dedupeKeyMode is keys_and_candidates, rows can point at
same-run peers with the same canonical URL, canonical email, or address
comparison key. That is a review queue, not an automatic dedupe result.
Cross-field signal rows Signals such as URL/email domain mismatch or same-address context are deterministic prompts from supplied values. They should guide review, not be used as verified identity, ownership, or fraud findings.
Warning rows Warnings keep uncertain context visible, such as an address shape that could not be confidently parsed into common components. A warning can coexist with usable output.
Example Use Cases
CRM intake review Normalize supplied website, email, and address values while preserving invalid input and row-level review pressure for human triage.
Lead-list cleanup before matching Create stable URL, mailbox, domain, and address comparison keys before joining records against another system.
Dedupe preparation without merge authority Emit deterministic keys and same-run exact candidate labels while keeping every original row and avoiding automatic survivor selection.
High-volume Apify storage handoff Preview a CSV, JSONL, JSON-array, Dataset, or KVS source route, inspect row counts and estimated row charges, then execute only after the mapping and spend-cap posture are acceptable.
Limitations
- This Actor only processes supplied records. It does not scrape websites, crawl pages, find missing contacts, or fetch external enrichment from providers.
- URL cleanup does not prove website reachability, safety, ownership, live freshness, redirect equivalence, or page content.
- Email cleanup does not verify deliverability, inbox existence, inbox ownership, mailbox ownership, or sender compliance.
- Address cleanup is U.S.-leaning heuristic normalization and comparison-key preparation. It does not certify postal deliverability, geocode addresses, add demographics, or prove that an address belongs to a contact.
- Dedupe keys and same-run candidate labels are preparation signals. They do not cluster records, rank matches, choose survivors, remove duplicates, merge CRM records, or mutate downstream systems.
recordStatusis deterministic routing evidence, not a confidence score, CRM truth verdict, legal identity claim, pricing signal, or support guarantee.- High
review_neededcounts can reflect messy supplied input or strict review settings. They do not mean the run failed or that the Actor verified those rows as bad. - Agentic-payment eligibility helps agents discover and pay for the Actor; it does not prove support readiness, response time, output correctness, live high-volume storage behavior, or charged-event counters.
- No-actionable rows preserve cardinality. They are useful for audit trails, and they count as processed rows because the Actor writes them to the default dataset.
- Source routes parse CSV, JSONL, JSON-array, Dataset, and KVS records only into supported Contact Cleanup fields. They are not spreadsheet editing, CRM import, CRM connector, request queue, OAuth, external URL crawler, or child-Actor orchestration features.
- Phone, company-name cleanup, arbitrary metadata passthrough, fuzzy scoring, live verification, enrichment, and automatic merge controls are outside the first-release contract.
Disclaimer
This Actor performs deterministic cleanup and match-preparation only. Use its output as structured evidence for review, matching, or downstream workflow decisions, not as proof that a contact is current, reachable, deliverable, enriched, owned, merged, or CRM-true.
Permissions
This Actor is designed to run with limited permissions. Inline runs
write only to the default dataset and default key-value store. Source
routes additionally need read access to the selected Apify Dataset or
key-value-store record supplied in the input, and write only their own
default dataset rows and OUTPUT summary.
The runtime does not require account-wide resources, sibling Actors, proxy groups, request queues, third-party network resources, CRM credentials, OAuth flows, or browser sessions.
Pricing
Pricing contract: Pay Per Event at $0.10 per 1,000 processed
rows, effective June 3, 2026 at 18:20:34.542 UTC. Campaign 10
Mission 6 Phase 5 collected selected public Store readback: the exact
Contact row and the allowsAgenticUsers=true filtered row both report
PAY_PER_EVENT with apify-default-dataset-item at $0.0001 per
processed row; the pricingModel=PAY_PER_EVENT filter includes
Contact, and the pricingModel=FREE filter returns zero Contact rows.
No released-build charged-event counter or live high-volume source-route
storage readback has been proved yet. Treat billingPreview as a local
estimate and guardrail until Mission 7 or another authorized release
owner reruns Store/API, released-build run and storage readback, and
charged-event counter checks.
Apify implements this local pricing contract with the synthetic
apify-default-dataset-item event. Each cleanup row written to the
default dataset is the selected billing unit at $0.0001; the Actor
does not call the custom charge API for this route. A 1,000-row
execution estimates $0.10 before any account-level taxes, credits, or
Apify billing adjustments once current PPE is proved on live surfaces.
Platform usage is included in the accepted event price. The structured
OUTPUT summary is not billed separately. Preview-only source runs,
malformed-source failures, row-cap failures, and execution spend-cap
failures write zero default dataset rows. Invalid, review-needed,
partial, and no-actionable accepted records are still emitted rows
during execution and count toward the row total because preserving row
cardinality is part of the product contract.
Release History
See ./CHANGELOG.md for version-by-version release notes and migration guidance.
