Email Verifier & List Cleaner — Deliverability Scoring avatar

Email Verifier & List Cleaner — Deliverability Scoring

Pricing

$4.00 / 1,000 dataset item scrapeds

Go to Apify Store
Email Verifier & List Cleaner — Deliverability Scoring

Email Verifier & List Cleaner — Deliverability Scoring

Honest email deliverability scorer for scraped lead lists. Syntax + MX + disposable + role + catch-all detection with a transparent 0-100 score and reasons. Never labels an email 'deliverable' it cannot justify from reliable signals — catch-all and SMTP-blocked emails are honestly marked 'risky' ...

Pricing

$4.00 / 1,000 dataset item scrapeds

Rating

0.0

(0)

Developer

Harry Schoeller

Harry Schoeller

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Email Verifier & List Cleaner — Honest Deliverability Scoring

Email verification and bounce checking for scraped lead lists — with a transparent 0-100 score and the reasons behind it. Clean your list of bad, disposable, role, and catch-all addresses before you pay a SaaS verifier (ZeroBounce, NeverBounce, Hunter, Kickbox, Bouncer) for the survivors. Built for lead enrichment and list cleaning pipelines.

Keywords: email verification, email validation, bounce checker, email deliverability, list cleaning, lead enrichment, lead-list cleaner, disposable email detection, role account detection, catch-all detection, MX lookup.

The honesty thesis

The #1 complaint across lead-gen scrapers (LinkedIn / Apollo / Google Maps, and the Secretary-of-State and license actors in this collection) is that scraped emails bounce. The market is full of "verifiers" that return a confident valid that bounces anyway.

This actor is built on one rule:

It never says deliverable unless it can actually justify it from reliable signals. Everything else is honestly labeled risky or unknown — never a faked valid.

That single rule turns the #1 complaint into the #1 reason for a 5-star review: no surprise bounces.

What is reliable vs. unreliable from a datacenter

This is the spine of the design. Read it before you run anything — it explains exactly what we can and cannot promise.

CheckReliable from a datacenter?Role in the score
RFC 5322 syntax / normalization✅ FullyHard gate (fail → undeliverable)
MX record lookup (DNS)✅ FullyHard gate (no MX & no A fallback → undeliverable)
Domain exists (A/AAAA/NS resolves)✅ FullyHard gate
Disposable-domain detection (maintained list)✅ FullyDowngrade to risky
Role-account detection (info@, sales@, admin@…)✅ FullyDowngrade to risky
Gibberish / random-string heuristics✅ (heuristic, never a hard fail alone)Score penalty + reason
Typo / known-provider misspelling (gmial.com)✅ (suggestion only)Reason + suggestion field
Catch-all (accept-all) detection⚠️ Best-effort SMTP — often inconclusiveCaps confidence; sets isCatchAll, downgrades to risky
Live SMTP RCPT probing of the mailboxUnreliable — port 25 egress is blocked/greylisted from shared datacenter ranges; catch-all domains accept everything; Google/Microsoft refuse or rate-limitCan only downgrade or yield unknown — NEVER promote to deliverable

Why we don't do binary valid/invalid

Most cheap "verifiers" do an SMTP RCPT probe from an Apify datacenter IP and return binary valid / invalid. That fails in two predictable ways:

  1. Datacenter SMTP is unreliable. Port 25 egress from shared datacenter ranges is routinely blocked, greylisted, or rate-limited. Big providers (Google, Microsoft) refuse or throttle these probes. So a "valid" from that probe is often noise.
  2. Catch-all domains accept everything. A catch-all (accept-all) domain returns 250 OK for any address — ceo@, asdfqwer@, anything. A binary verifier marks them all valid; then they bounce.

We refuse to launder either of those into a confident valid. Instead:

  • SMTP is strictly one-directional. A hard 5xx rejection is trustworthy → we downgrade to undeliverable. A 250 acceptance is ignored for promotion — it can never make an email deliverable.
  • Catch-all caps the result at risky and sets isCatchAll: true, because per-mailbox deliverability there is genuinely unknowable.
  • Blocked / greylisted / timeoutunknown. An honest "we don't know" beats a false "valid."

Status definitions (the honest contract)

  • deliverable — Syntax valid, MX present, NOT disposable, NOT a detected catch-all, and NOT (by policy) a role account. This is "all reliable signals are green." It is not a delivery guarantee, and we will never label something deliverable that we cannot justify from reliable signals. SMTP is never required to reach this status, and SMTP can never be the thing that grants it.
  • risky — Real-looking but with a known risk: role account, catch-all domain, disposable, gibberish-leaning local part, or a typo suggestion. Send if you want, but expect a lower hit rate / spam-trap risk.
  • undeliverable — A hard, provable failure only: bad syntax, no MX and no A-record fallback, the domain doesn't resolve, or a trustworthy SMTP 5xx rejection. We only condemn an email when the failure is provable.
  • unknown — We couldn't determine it: DNS timeout/SERVFAIL, or an SMTP probe that was blocked/greylisted/inconclusive.

How the score stays honest

Deterministic, capped pipeline — not a black box. Order matters: hard gates first, then deductions, then caps that enforce the honesty rule.

Start: score = 100, status = "deliverable"
HARD GATES (terminal):
invalid syntax -> undeliverable, 0
domain doesn't resolve, no MX -> undeliverable, 5
DNS errored/timed out -> unknown, null
DEDUCTIONS:
no MX but A record (fallback) -> -30
gibberish local part -> -25
possible provider typo -> -15 (+ `suggestion`)
DOWNGRADE FLAGS (cap status, never upgrade):
isDisposable -> cap "risky", score min 40
isRole (if treatRoleAsRisky) -> cap "risky", score min 60
isCatchAll -> cap "risky", score min 65
isFreeProvider -> informational only
SMTP (best_effort only, one-directional):
5xx rejection -> undeliverable, 10 (trustworthy)
250 acceptance -> NO promotion (untrustworthy)
blocked/greylisted/4xx/timeout -> unknown (if it was deliverable)
FINAL CAP: status can be "deliverable" ONLY if syntax ok AND mx present AND
!isDisposable AND !isCatchAll AND (!isRole || !treatRoleAsRisky) AND no smtp rejection.

Every record carries its full reasons[], so you can audit exactly why a score landed where it did.

Input

Provide exactly one of emails or sourceDatasetId.

{
// Option A — raw list:
"emails": ["john@acme.com", "info@acme.com"],
// Option B — chain off another actor's dataset:
"sourceDatasetId": "<dataset id>",
"emailField": "email", // dot paths supported (contact.email)
"passThroughFields": ["name", "company", "phone"],
"smtpCheck": "off", // "off" (recommended) | "best_effort"
"detectCatchAll": true,
"deduplicate": true,
"treatRoleAsRisky": true,
"concurrency": 50
}

Output (per email)

{
"email": "info@acme.com",
"originalEmail": "Info@Acme.com ",
"status": "risky",
"score": 55,
"reasons": ["valid_syntax", "mx_found:aspmx.l.google.com", "role_account:info", "domain_is_catch_all"],
"mx": ["aspmx.l.google.com", "alt1.aspmx.l.google.com"],
"isDisposable": false,
"isRole": true,
"isCatchAll": true,
"isFreeProvider": false,
"suggestion": null,
"domain": "acme.com",
"checkedAt": "2026-06-20T00:00:00.000Z"
// ...any passThroughFields copied here
}

A run-level OUTPUT key holds counts per status plus the honesty disclaimer.

Chaining example (lead enrichment)

Run a leads scraper from this collection — e.g. insurance-license-search or sos-entity-search — then feed its dataset straight in:

{
"sourceDatasetId": "<the leads run's defaultDatasetId>",
"emailField": "email",
"passThroughFields": ["entityName", "phone", "city", "state"]
}

The cleaned, scored list comes back joined to each lead, ready to import into your CRM or hand the deliverable subset to an outreach tool.

Pricing (pay per event)

  • $0.004 per email verified (charged once per unique email after dedup).
  • $0.002 per best-effort SMTP probe (only when smtpCheck: "best_effort" and a probe is actually made).

Cost is decoupled from compute — you pay for results, not runtime. We never charge for inputs rejected at validation or for duplicates collapsed by dedup.

Notes

  • No browser, no proxy, no anti-bot, no login. Pure DNS + string heuristics + maintained lists, with an optional node:net SMTP socket.
  • The disposable-domain list is vendored in-repo so runs are offline-deterministic.
  • Competitor SaaS (ZeroBounce, NeverBounce, Hunter, Kickbox, Bouncer) use warmed dedicated IPs and accept-all intelligence an Apify run cannot replicate — we don't claim parity. Use this to pre-clean cheaply, then pay a SaaS for the survivors if you need deeper SMTP accuracy. (Competitor positioning is from general Apify Store knowledge; verify current ratings before relying on any comparison.)