Leadslogix Email Discovery
Pricing
Pay per usage
Leadslogix Email Discovery
4-layer email discovery pipeline: Layer 0 (DNS/OSINT), Layer 1 (Site Crawl), Layer 2 (Multi-Engine Search), Layer 3 (Google Playwright). Plus 8-pattern email prediction engine.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Leadslogix LLC
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
15 hours ago
Last modified
Categories
Share
Discover email addresses for any domain using a 4-layer pipeline that combines passive DNS intelligence, direct website crawling, multi-engine search, and optional Google Playwright extraction. Built for B2B sales teams, recruiters, and market researchers who need verified contact emails without manual prospecting.
Why This Actor
Most email finder tools rely on a single data source — typically a purchased database or a single search engine. LeadsLogix Email Discovery runs up to four independent discovery layers per domain, cross-references results, and assigns quality tiers so you know which emails are crawl-verified and which are predicted. The pipeline is designed for accuracy over volume: every discovered email is tagged with its source, confidence score, and quality grade.
Key Features
- 4-layer discovery pipeline -- passive DNS, site crawl, search engines, and Google Playwright (optional)
- 8-pattern email prediction engine -- generates likely email addresses from person names using common corporate patterns (firstname.lastname, f.last, first, etc.)
- Noise filtering -- automatically removes noreply, postmaster, webmaster, abuse, mailer-daemon, and addresses from known noise domains (sentry.io, wixpress.com, googleapis.com, etc.)
- Quality tiering -- every email is graded A (crawl-verified), B (DNS/search-discovered), or C (predicted) so you can prioritize outreach
- Domain deduplication -- strips duplicate emails across layers and normalizes domains from URLs
- Flexible input -- accepts CSV upload, Excel upload, public file URL, or inline JSON array
- Dual output -- results pushed to Apify Dataset (queryable via API) and exported as CSV to Key-Value Store
- Proxy support -- integrates with Apify Proxy (residential or datacenter) for search and crawl operations
- Human-like behavior -- 1-second delays between page requests, standard browser user agents, respectful crawl patterns
- Free tier included -- process up to 20 domains per run at no charge beyond Apify platform compute
How the Discovery Layers Work
The pipeline processes each domain sequentially through up to four layers. Later layers only run if earlier layers did not find sufficient results. Each layer has a different speed, risk, and confidence profile.
Layer 0 -- DNS/OSINT (Passive)
No HTTP requests to the target domain. Queries public DNS records only.
| Check | What It Finds |
|---|---|
DMARC rua/ruf records | Reporting email addresses published in _dmarc.{domain} TXT records |
SPF include directives | Email addresses embedded in SPF TXT records |
| MX records | Mail exchange hosts (used for domain validation, not email extraction) |
Speed: Near-instant (DNS queries only). Confidence: 60-70. Quality tier: B.
Layer 1 -- Site Crawl (Direct)
Crawls the target domain via httpx with a standard browser user agent. Visits these paths:
/ (homepage)/contact/contact-us/about/about-us/team/impressum
Both the root domain (https://domain.com) and the www subdomain (https://www.domain.com) are checked. Emails are extracted via regex and filtered to match the target domain only. A 1-second delay is inserted between each page request.
Speed: 8-15 seconds per domain (8 pages with delays). Confidence: 85. Quality tier: A.
Layer 2 -- Search Engine Discovery
Searches DuckDuckGo for the domain's published email addresses using two queries:
"{domain}" email contactsite:{domain} "@{domain}"
Emails are extracted from search result titles and snippets. Only emails containing the target domain are kept. A 2-second delay is inserted between queries to avoid rate limiting.
Speed: 5-10 seconds per domain. Confidence: 75. Quality tier: B.
Layer 3 -- Google Playwright (Optional, Skipped by Default)
Uses a headless Playwright browser to search Google directly. This layer is disabled by default because it carries higher CAPTCHA risk and requires more compute resources. Enable it only when Layers 0-2 return insufficient results.
Speed: 15-30 seconds per domain. Confidence: 80. Quality tier: A. Risk: Google may serve CAPTCHAs. Requires Apify Proxy (residential recommended).
Email Prediction Engine
When the actor has contact names but no discovered emails for a domain, the 8-pattern prediction engine generates likely email addresses:
| Pattern | Example | Confidence |
|---|---|---|
firstname.lastname@domain | john.doe@acme.com | 85 |
flastname@domain | jdoe@acme.com | 80 |
firstname@domain | john@acme.com | 70 |
f.lastname@domain | j.doe@acme.com | 65 |
firstnamelastname@domain | johndoe@acme.com | 55 |
firstname_lastname@domain | john_doe@acme.com | 40 |
lastname.firstname@domain | doe.john@acme.com | 30 |
lastname@domain | doe@acme.com | 25 |
Predicted emails are assigned quality tier C. Pair this actor with the LeadsLogix Email Verifier to validate predicted addresses before outreach.
Input
Provide domains using one of three methods. If multiple are provided, the actor uses the first one found in this priority order: inline domains, file upload, URL.
Input Schema
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
domains | array[string] | No | -- | Inline JSON array of domain strings. Example: ["acme.com", "globex.net"] |
inputFile | string (file) | No | -- | Upload a CSV or Excel file with a domain, domains, website, or url column |
inputUrl | string (URL) | No | -- | Public URL to a CSV or Excel file with domains |
maxResults | integer | No | 20 | Maximum number of domains to process. Controls pricing tier enforcement |
layers | string | No | "0,1,2" | Comma-separated layer IDs to run. Options: 0 (DNS), 1 (crawl), 2 (search), 3 (Google) |
skipGoogle | boolean | No | true | Skip Google Playwright layer. Overrides layers to exclude layer 3 |
includeEmailPrediction | boolean | No | true | Generate predicted emails for contacts with names but no discovered addresses |
proxyConfiguration | object | No | -- | Apify Proxy settings for crawl and search operations |
Input File Format
Your CSV or Excel file needs at least one column with domain data. The actor auto-detects these column names (case-insensitive):
domaindomainswebsiteurl
Full URLs are accepted -- the actor extracts the domain automatically (https://www.acme.com/about becomes acme.com).
Example CSV:
domainacme.comglobex.netinitech.ioumbrella-corp.com
Output
Dataset Schema
Each discovered email is stored as one row in the Apify Dataset.
| Field | Type | Description |
|---|---|---|
domain | string | The input domain that was searched |
email | string | Discovered or predicted email address (lowercase) |
source_layer | string | Discovery method: L0_DMARC, L0_SPF, L1_CRAWL, L2_SEARCH, NODEJS_PIPELINE, or PREDICTION |
confidence | integer | Confidence score from 0 to 100 |
quality_tier | string | A (crawl-verified, >=85), B (DNS/search, 60-84), or C (predicted, <60) |
Quality Tiers Explained
| Tier | Confidence Range | Source | Recommended Action |
|---|---|---|---|
| A | 85-100 | Direct website crawl, Google search | Safe for outreach. Email was found on a live web page. |
| B | 60-84 | DNS records, search engine snippets | Likely valid. Verify before high-volume sending. |
| C | Below 60 | Pattern prediction engine | Unverified guess. Must verify before any use. |
Additional Output
- CSV file -- stored in Apify Key-Value Store as
output.csv(UTF-8 with BOM for Excel compatibility) - Usage summary -- stored as
usage(JSON) in Key-Value Store with input/output counts and pricing info
Example Output Row
{"domain": "acme.com","email": "sales@acme.com","source_layer": "L1_CRAWL","confidence": 85,"quality_tier": "A"}
Pricing
| Tier | Actor Fee | Results | Best For |
|---|---|---|---|
| Free | $0 | Up to 20 per run | Testing, evaluation |
| Pay-Per-Event | $2 per 1,000 results | Unlimited | Production workloads |
Important: Apify platform compute charges (CPU time, memory, bandwidth) are billed separately by Apify based on your usage. The prices above cover the actor software license only. See Apify pricing for platform costs.
Cost Estimation
| Scenario | Results | Actor Fee | Est. Platform Cost | Total |
|---|---|---|---|---|
| Quick test | 20 | $0 (free) | ~$0.05 | ~$0.05 |
| Small batch | 100 | $0.16 | ~$0.15 | ~$0.31 |
| Medium batch | 500 | $0.96 | ~$0.50 | ~$1.46 |
| Large batch | 1,000 | $1.96 | ~$1.00 | ~$2.96 |
| Enterprise | 10,000 | $19.96 | ~$10.00 | ~$29.96 |
Actor fee formula: (results - 20) x $0.002 for results > 20, $0 for <= 20. Platform cost estimates assume Layers 0-2 enabled (no Google Playwright). Enabling Layer 3 increases compute time by approximately 3-5x due to browser overhead.
Usage Examples
Example 1: Quick Test with Inline Domains
{"domains": ["iana.org", "icann.org", "ietf.org"],"maxResults": 20,"layers": "0,1,2"}
Example 2: CSV File Upload
{"inputFile": "prospects.csv","maxResults": 500,"layers": "0,1,2","includeEmailPrediction": true}
Example 3: Remote CSV via URL
{"inputUrl": "https://docs.google.com/spreadsheets/d/.../export?format=csv","maxResults": 1000,"layers": "0,1,2","skipGoogle": true,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Example 4: Thorough Discovery with Google (Layer 3)
{"domains": ["hard-to-find-emails.com"],"maxResults": 20,"layers": "0,1,2,3","skipGoogle": false,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Example 5: DNS-Only Discovery (Fastest, No Web Requests)
{"domains": ["acme.com", "globex.net", "initech.io"],"maxResults": 20,"layers": "0"}
Retrieving Results via API
After the run completes, fetch results from the Apify Dataset:
# Get all results as JSONcurl "https://api.apify.com/v2/datasets/{DATASET_ID}/items?token={API_TOKEN}"# Get results as CSVcurl "https://api.apify.com/v2/datasets/{DATASET_ID}/items?token={API_TOKEN}&format=csv"# Download the CSV file from Key-Value Storecurl "https://api.apify.com/v2/key-value-stores/{STORE_ID}/records/output.csv?token={API_TOKEN}"
Performance
Expected throughput with Layers 0-2 enabled (Layer 3 disabled):
| Metric | Value |
|---|---|
| Domains per minute | 4-6 (with human-like delays) |
| Average emails per domain | 1-5 (varies by industry and region) |
| Layer 0 hit rate | 10-20% of domains have emails in DNS records |
| Layer 1 hit rate | 40-60% of domains have emails on contact/about pages |
| Layer 2 hit rate | 30-50% of domains have emails indexed in search engines |
| Combined discovery rate | 50-70% of domains yield at least one email |
| Memory usage | ~256 MB (Layers 0-2), ~512 MB with Layer 3 |
Regional variation: Domains in Western Europe and North America typically have higher email discovery rates (60-80%). Asian domains (.kr, .cn, .jp) often have lower rates (10-30%) due to form-based contact pages, missing MX records, and non-Latin character email systems.
Noise Filtering
The actor automatically removes emails matching these patterns:
Filtered Local Parts
noreply, no-reply, donotreply, mailer-daemon, postmaster, hostmaster, webmaster, abuse
Filtered Domains
example.com, test.com, localhost, sentry.io, wixpress.com, w3.org, schema.org, googleapis.com, gstatic.com
Domain Matching
Layer 1 (crawl) only keeps emails that match the target domain exactly. For example, when crawling acme.com, an email like partner@otherdomain.com found on the page is discarded. Layers 0 and 2 apply domain-aware filtering to ensure relevance.
Proxy Configuration
For best results with Layers 1-3, use Apify Proxy:
- Datacenter proxy -- sufficient for Layer 0 (DNS) and Layer 1 (site crawl). Lowest cost.
- Residential proxy -- recommended for Layer 2 (search engines) and required for Layer 3 (Google). Reduces CAPTCHA risk.
If no proxy is configured, the actor runs requests from the Apify platform IP directly. This works for small batches but may trigger rate limits on search engines for larger runs.
Integrations
Chain with Other LeadsLogix Actors
This actor is part of the LeadsLogix B2B intelligence suite on Apify Store:
- LeadsLogix Website Discovery -- find official websites for a list of company names
- LeadsLogix Company Scraper -- crawl websites for company info and decision-maker contacts
- LeadsLogix Email Discovery (this actor) -- discover emails for known domains
- LeadsLogix Email Verifier -- verify discovered emails with a 6-check pipeline
- LeadsLogix Pipeline -- run all stages in a single actor (end-to-end)
Recommended workflow: Website Discovery -> Company Scraper -> Email Discovery -> Email Verifier
Export Formats
- Apify Dataset -- query via REST API, export as JSON, CSV, XML, or Excel
- CSV -- download
output.csvfrom Key-Value Store (UTF-8 with BOM) - Webhook -- configure Apify webhooks to POST results to your CRM or pipeline on run completion
Frequently Asked Questions
How is this different from Hunter.io, Snov.io, or Apollo? Those services query proprietary databases of previously discovered emails. This actor discovers emails in real-time by actually crawling websites, reading DNS records, and searching the public web. You get fresher results and do not pay for stale data. The tradeoff is longer run time per domain.
Will this actor find personal Gmail or Outlook addresses?
No. The domain-matching filter in Layer 1 only keeps emails that match the target domain. If you crawl acme.com, only *@acme.com emails are returned. Gmail, Yahoo, and other free email provider addresses are filtered out.
What happens if I exceed the free tier limit?
The actor processes up to the number of domains specified in maxResults. The default is 20 (free tier). To process more, increase maxResults for pay-per-event billing ($2 per 1,000 results beyond the free tier).
Can I run this on my own Apify account without paying the actor fee? The free tier (20 domains per run) has no actor fee -- you only pay Apify platform compute charges. For larger runs, pay-per-event pricing applies at $2 per 1,000 results beyond the free tier.
Does Layer 3 (Google Playwright) always work? Not reliably. Google actively blocks automated access and may serve CAPTCHAs. Using residential proxies improves success rates, but Layer 3 should be treated as a last resort. Layers 0-2 cover the majority of discoverable emails without Google.
How do I verify the emails this actor finds? Pair this actor with the LeadsLogix Email Verifier, which runs syntax, DNS, SMTP, catch-all, disposable, and DKIM/SPF/DMARC checks. Feed this actor's output CSV directly into the verifier.
Can I process Excel files?
Yes. Both .csv and .xlsx/.xls files are supported via file upload or URL. The actor auto-detects the file format and looks for a domain column.
What if my file has URLs instead of domains?
The actor extracts domains from full URLs automatically. A column containing https://www.acme.com/about will be parsed as acme.com.
Is there a rate limit?
The actor inserts human-like delays (1-2 seconds between requests) to avoid triggering target website rate limits. For search engines, a 2-second delay is used between queries. There is no hard rate limit on the actor itself beyond the maxResults setting.
Does the actor respect robots.txt? Layer 1 (site crawl) requests pages via httpx like a standard browser and follows HTTP redirects. It does not explicitly parse robots.txt, but the fixed set of common paths (/contact, /about, /team, /impressum) are pages intended for public access. No recursive spidering is performed.
Limitations
- Email prediction (Tier C) is unverified. Predicted emails are pattern-based guesses. Always verify before sending.
- Asian and non-English domains have lower discovery rates (10-30%) due to form-based contact pages and non-Latin email systems.
- Layer 3 (Google) is unreliable due to CAPTCHA enforcement. Do not depend on it for production workloads.
- The actor discovers published emails only. It does not access private databases, social media DMs, or login-protected pages.
- No SMTP verification is performed. Use the LeadsLogix Email Verifier actor to confirm deliverability.
- Search engine rate limits may reduce Layer 2 effectiveness during very large runs (1,000+ domains). Space large batches 1-2 hours apart or use residential proxies.
Changelog
v1.0.0 (2026-05-08)
- Initial release on Apify Store
- 4-layer discovery pipeline: DNS/OSINT, site crawl, DuckDuckGo search, Google Playwright
- 8-pattern email prediction engine
- Noise email and noise domain filtering
- Quality tiering (A/B/C) with confidence scoring (0-100)
- CSV and Dataset dual output
- Free tier (20 domains per run), pay-per-event ($2/1,000 results beyond free tier)
- Apify Proxy integration (datacenter and residential)
- Auto-detection of domain column from CSV/Excel input
- URL-to-domain extraction for input files with full URLs
Support
- Issues and feature requests: Open an issue on the Apify Store actor page
- Email: hello@leadslogix.com
- Documentation: LeadsLogix on GitHub
LeadsLogix Email Discovery is a B2B email finder and domain email search tool built for sales intelligence, lead generation, email prospecting, and market research. It discovers business email addresses from company domains without relying on third-party databases.