Leadslogix Email Discovery avatar

Leadslogix Email Discovery

Pricing

Pay per usage

Go to Apify Store
Leadslogix Email Discovery

Leadslogix Email Discovery

4-layer email discovery pipeline: Layer 0 (DNS/OSINT), Layer 1 (Site Crawl), Layer 2 (Multi-Engine Search), Layer 3 (Google Playwright). Plus 8-pattern email prediction engine.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Leadslogix LLC

Leadslogix LLC

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

15 hours ago

Last modified

Categories

Share

Discover email addresses for any domain using a 4-layer pipeline that combines passive DNS intelligence, direct website crawling, multi-engine search, and optional Google Playwright extraction. Built for B2B sales teams, recruiters, and market researchers who need verified contact emails without manual prospecting.

Why This Actor

Most email finder tools rely on a single data source — typically a purchased database or a single search engine. LeadsLogix Email Discovery runs up to four independent discovery layers per domain, cross-references results, and assigns quality tiers so you know which emails are crawl-verified and which are predicted. The pipeline is designed for accuracy over volume: every discovered email is tagged with its source, confidence score, and quality grade.

Key Features

  • 4-layer discovery pipeline -- passive DNS, site crawl, search engines, and Google Playwright (optional)
  • 8-pattern email prediction engine -- generates likely email addresses from person names using common corporate patterns (firstname.lastname, f.last, first, etc.)
  • Noise filtering -- automatically removes noreply, postmaster, webmaster, abuse, mailer-daemon, and addresses from known noise domains (sentry.io, wixpress.com, googleapis.com, etc.)
  • Quality tiering -- every email is graded A (crawl-verified), B (DNS/search-discovered), or C (predicted) so you can prioritize outreach
  • Domain deduplication -- strips duplicate emails across layers and normalizes domains from URLs
  • Flexible input -- accepts CSV upload, Excel upload, public file URL, or inline JSON array
  • Dual output -- results pushed to Apify Dataset (queryable via API) and exported as CSV to Key-Value Store
  • Proxy support -- integrates with Apify Proxy (residential or datacenter) for search and crawl operations
  • Human-like behavior -- 1-second delays between page requests, standard browser user agents, respectful crawl patterns
  • Free tier included -- process up to 20 domains per run at no charge beyond Apify platform compute

How the Discovery Layers Work

The pipeline processes each domain sequentially through up to four layers. Later layers only run if earlier layers did not find sufficient results. Each layer has a different speed, risk, and confidence profile.

Layer 0 -- DNS/OSINT (Passive)

No HTTP requests to the target domain. Queries public DNS records only.

CheckWhat It Finds
DMARC rua/ruf recordsReporting email addresses published in _dmarc.{domain} TXT records
SPF include directivesEmail addresses embedded in SPF TXT records
MX recordsMail exchange hosts (used for domain validation, not email extraction)

Speed: Near-instant (DNS queries only). Confidence: 60-70. Quality tier: B.

Layer 1 -- Site Crawl (Direct)

Crawls the target domain via httpx with a standard browser user agent. Visits these paths:

/ (homepage)
/contact
/contact-us
/about
/about-us
/team
/impressum

Both the root domain (https://domain.com) and the www subdomain (https://www.domain.com) are checked. Emails are extracted via regex and filtered to match the target domain only. A 1-second delay is inserted between each page request.

Speed: 8-15 seconds per domain (8 pages with delays). Confidence: 85. Quality tier: A.

Layer 2 -- Search Engine Discovery

Searches DuckDuckGo for the domain's published email addresses using two queries:

"{domain}" email contact
site:{domain} "@{domain}"

Emails are extracted from search result titles and snippets. Only emails containing the target domain are kept. A 2-second delay is inserted between queries to avoid rate limiting.

Speed: 5-10 seconds per domain. Confidence: 75. Quality tier: B.

Layer 3 -- Google Playwright (Optional, Skipped by Default)

Uses a headless Playwright browser to search Google directly. This layer is disabled by default because it carries higher CAPTCHA risk and requires more compute resources. Enable it only when Layers 0-2 return insufficient results.

Speed: 15-30 seconds per domain. Confidence: 80. Quality tier: A. Risk: Google may serve CAPTCHAs. Requires Apify Proxy (residential recommended).

Email Prediction Engine

When the actor has contact names but no discovered emails for a domain, the 8-pattern prediction engine generates likely email addresses:

PatternExampleConfidence
firstname.lastname@domainjohn.doe@acme.com85
flastname@domainjdoe@acme.com80
firstname@domainjohn@acme.com70
f.lastname@domainj.doe@acme.com65
firstnamelastname@domainjohndoe@acme.com55
firstname_lastname@domainjohn_doe@acme.com40
lastname.firstname@domaindoe.john@acme.com30
lastname@domaindoe@acme.com25

Predicted emails are assigned quality tier C. Pair this actor with the LeadsLogix Email Verifier to validate predicted addresses before outreach.


Input

Provide domains using one of three methods. If multiple are provided, the actor uses the first one found in this priority order: inline domains, file upload, URL.

Input Schema

FieldTypeRequiredDefaultDescription
domainsarray[string]No--Inline JSON array of domain strings. Example: ["acme.com", "globex.net"]
inputFilestring (file)No--Upload a CSV or Excel file with a domain, domains, website, or url column
inputUrlstring (URL)No--Public URL to a CSV or Excel file with domains
maxResultsintegerNo20Maximum number of domains to process. Controls pricing tier enforcement
layersstringNo"0,1,2"Comma-separated layer IDs to run. Options: 0 (DNS), 1 (crawl), 2 (search), 3 (Google)
skipGooglebooleanNotrueSkip Google Playwright layer. Overrides layers to exclude layer 3
includeEmailPredictionbooleanNotrueGenerate predicted emails for contacts with names but no discovered addresses
proxyConfigurationobjectNo--Apify Proxy settings for crawl and search operations

Input File Format

Your CSV or Excel file needs at least one column with domain data. The actor auto-detects these column names (case-insensitive):

  • domain
  • domains
  • website
  • url

Full URLs are accepted -- the actor extracts the domain automatically (https://www.acme.com/about becomes acme.com).

Example CSV:

domain
acme.com
globex.net
initech.io
umbrella-corp.com

Output

Dataset Schema

Each discovered email is stored as one row in the Apify Dataset.

FieldTypeDescription
domainstringThe input domain that was searched
emailstringDiscovered or predicted email address (lowercase)
source_layerstringDiscovery method: L0_DMARC, L0_SPF, L1_CRAWL, L2_SEARCH, NODEJS_PIPELINE, or PREDICTION
confidenceintegerConfidence score from 0 to 100
quality_tierstringA (crawl-verified, >=85), B (DNS/search, 60-84), or C (predicted, <60)

Quality Tiers Explained

TierConfidence RangeSourceRecommended Action
A85-100Direct website crawl, Google searchSafe for outreach. Email was found on a live web page.
B60-84DNS records, search engine snippetsLikely valid. Verify before high-volume sending.
CBelow 60Pattern prediction engineUnverified guess. Must verify before any use.

Additional Output

  • CSV file -- stored in Apify Key-Value Store as output.csv (UTF-8 with BOM for Excel compatibility)
  • Usage summary -- stored as usage (JSON) in Key-Value Store with input/output counts and pricing info

Example Output Row

{
"domain": "acme.com",
"email": "sales@acme.com",
"source_layer": "L1_CRAWL",
"confidence": 85,
"quality_tier": "A"
}

Pricing

TierActor FeeResultsBest For
Free$0Up to 20 per runTesting, evaluation
Pay-Per-Event$2 per 1,000 resultsUnlimitedProduction workloads

Important: Apify platform compute charges (CPU time, memory, bandwidth) are billed separately by Apify based on your usage. The prices above cover the actor software license only. See Apify pricing for platform costs.

Cost Estimation

ScenarioResultsActor FeeEst. Platform CostTotal
Quick test20$0 (free)~$0.05~$0.05
Small batch100$0.16~$0.15~$0.31
Medium batch500$0.96~$0.50~$1.46
Large batch1,000$1.96~$1.00~$2.96
Enterprise10,000$19.96~$10.00~$29.96

Actor fee formula: (results - 20) x $0.002 for results > 20, $0 for <= 20. Platform cost estimates assume Layers 0-2 enabled (no Google Playwright). Enabling Layer 3 increases compute time by approximately 3-5x due to browser overhead.


Usage Examples

Example 1: Quick Test with Inline Domains

{
"domains": ["iana.org", "icann.org", "ietf.org"],
"maxResults": 20,
"layers": "0,1,2"
}

Example 2: CSV File Upload

{
"inputFile": "prospects.csv",
"maxResults": 500,
"layers": "0,1,2",
"includeEmailPrediction": true
}

Example 3: Remote CSV via URL

{
"inputUrl": "https://docs.google.com/spreadsheets/d/.../export?format=csv",
"maxResults": 1000,
"layers": "0,1,2",
"skipGoogle": true,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Example 4: Thorough Discovery with Google (Layer 3)

{
"domains": ["hard-to-find-emails.com"],
"maxResults": 20,
"layers": "0,1,2,3",
"skipGoogle": false,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Example 5: DNS-Only Discovery (Fastest, No Web Requests)

{
"domains": ["acme.com", "globex.net", "initech.io"],
"maxResults": 20,
"layers": "0"
}

Retrieving Results via API

After the run completes, fetch results from the Apify Dataset:

# Get all results as JSON
curl "https://api.apify.com/v2/datasets/{DATASET_ID}/items?token={API_TOKEN}"
# Get results as CSV
curl "https://api.apify.com/v2/datasets/{DATASET_ID}/items?token={API_TOKEN}&format=csv"
# Download the CSV file from Key-Value Store
curl "https://api.apify.com/v2/key-value-stores/{STORE_ID}/records/output.csv?token={API_TOKEN}"

Performance

Expected throughput with Layers 0-2 enabled (Layer 3 disabled):

MetricValue
Domains per minute4-6 (with human-like delays)
Average emails per domain1-5 (varies by industry and region)
Layer 0 hit rate10-20% of domains have emails in DNS records
Layer 1 hit rate40-60% of domains have emails on contact/about pages
Layer 2 hit rate30-50% of domains have emails indexed in search engines
Combined discovery rate50-70% of domains yield at least one email
Memory usage~256 MB (Layers 0-2), ~512 MB with Layer 3

Regional variation: Domains in Western Europe and North America typically have higher email discovery rates (60-80%). Asian domains (.kr, .cn, .jp) often have lower rates (10-30%) due to form-based contact pages, missing MX records, and non-Latin character email systems.


Noise Filtering

The actor automatically removes emails matching these patterns:

Filtered Local Parts

noreply, no-reply, donotreply, mailer-daemon, postmaster, hostmaster, webmaster, abuse

Filtered Domains

example.com, test.com, localhost, sentry.io, wixpress.com, w3.org, schema.org, googleapis.com, gstatic.com

Domain Matching

Layer 1 (crawl) only keeps emails that match the target domain exactly. For example, when crawling acme.com, an email like partner@otherdomain.com found on the page is discarded. Layers 0 and 2 apply domain-aware filtering to ensure relevance.


Proxy Configuration

For best results with Layers 1-3, use Apify Proxy:

  • Datacenter proxy -- sufficient for Layer 0 (DNS) and Layer 1 (site crawl). Lowest cost.
  • Residential proxy -- recommended for Layer 2 (search engines) and required for Layer 3 (Google). Reduces CAPTCHA risk.

If no proxy is configured, the actor runs requests from the Apify platform IP directly. This works for small batches but may trigger rate limits on search engines for larger runs.


Integrations

Chain with Other LeadsLogix Actors

This actor is part of the LeadsLogix B2B intelligence suite on Apify Store:

  1. LeadsLogix Website Discovery -- find official websites for a list of company names
  2. LeadsLogix Company Scraper -- crawl websites for company info and decision-maker contacts
  3. LeadsLogix Email Discovery (this actor) -- discover emails for known domains
  4. LeadsLogix Email Verifier -- verify discovered emails with a 6-check pipeline
  5. LeadsLogix Pipeline -- run all stages in a single actor (end-to-end)

Recommended workflow: Website Discovery -> Company Scraper -> Email Discovery -> Email Verifier

Export Formats

  • Apify Dataset -- query via REST API, export as JSON, CSV, XML, or Excel
  • CSV -- download output.csv from Key-Value Store (UTF-8 with BOM)
  • Webhook -- configure Apify webhooks to POST results to your CRM or pipeline on run completion

Frequently Asked Questions

How is this different from Hunter.io, Snov.io, or Apollo? Those services query proprietary databases of previously discovered emails. This actor discovers emails in real-time by actually crawling websites, reading DNS records, and searching the public web. You get fresher results and do not pay for stale data. The tradeoff is longer run time per domain.

Will this actor find personal Gmail or Outlook addresses? No. The domain-matching filter in Layer 1 only keeps emails that match the target domain. If you crawl acme.com, only *@acme.com emails are returned. Gmail, Yahoo, and other free email provider addresses are filtered out.

What happens if I exceed the free tier limit? The actor processes up to the number of domains specified in maxResults. The default is 20 (free tier). To process more, increase maxResults for pay-per-event billing ($2 per 1,000 results beyond the free tier).

Can I run this on my own Apify account without paying the actor fee? The free tier (20 domains per run) has no actor fee -- you only pay Apify platform compute charges. For larger runs, pay-per-event pricing applies at $2 per 1,000 results beyond the free tier.

Does Layer 3 (Google Playwright) always work? Not reliably. Google actively blocks automated access and may serve CAPTCHAs. Using residential proxies improves success rates, but Layer 3 should be treated as a last resort. Layers 0-2 cover the majority of discoverable emails without Google.

How do I verify the emails this actor finds? Pair this actor with the LeadsLogix Email Verifier, which runs syntax, DNS, SMTP, catch-all, disposable, and DKIM/SPF/DMARC checks. Feed this actor's output CSV directly into the verifier.

Can I process Excel files? Yes. Both .csv and .xlsx/.xls files are supported via file upload or URL. The actor auto-detects the file format and looks for a domain column.

What if my file has URLs instead of domains? The actor extracts domains from full URLs automatically. A column containing https://www.acme.com/about will be parsed as acme.com.

Is there a rate limit? The actor inserts human-like delays (1-2 seconds between requests) to avoid triggering target website rate limits. For search engines, a 2-second delay is used between queries. There is no hard rate limit on the actor itself beyond the maxResults setting.

Does the actor respect robots.txt? Layer 1 (site crawl) requests pages via httpx like a standard browser and follows HTTP redirects. It does not explicitly parse robots.txt, but the fixed set of common paths (/contact, /about, /team, /impressum) are pages intended for public access. No recursive spidering is performed.


Limitations

  • Email prediction (Tier C) is unverified. Predicted emails are pattern-based guesses. Always verify before sending.
  • Asian and non-English domains have lower discovery rates (10-30%) due to form-based contact pages and non-Latin email systems.
  • Layer 3 (Google) is unreliable due to CAPTCHA enforcement. Do not depend on it for production workloads.
  • The actor discovers published emails only. It does not access private databases, social media DMs, or login-protected pages.
  • No SMTP verification is performed. Use the LeadsLogix Email Verifier actor to confirm deliverability.
  • Search engine rate limits may reduce Layer 2 effectiveness during very large runs (1,000+ domains). Space large batches 1-2 hours apart or use residential proxies.

Changelog

v1.0.0 (2026-05-08)

  • Initial release on Apify Store
  • 4-layer discovery pipeline: DNS/OSINT, site crawl, DuckDuckGo search, Google Playwright
  • 8-pattern email prediction engine
  • Noise email and noise domain filtering
  • Quality tiering (A/B/C) with confidence scoring (0-100)
  • CSV and Dataset dual output
  • Free tier (20 domains per run), pay-per-event ($2/1,000 results beyond free tier)
  • Apify Proxy integration (datacenter and residential)
  • Auto-detection of domain column from CSV/Excel input
  • URL-to-domain extraction for input files with full URLs

Support


LeadsLogix Email Discovery is a B2B email finder and domain email search tool built for sales intelligence, lead generation, email prospecting, and market research. It discovers business email addresses from company domains without relying on third-party databases.