Email Extractor & Lookup API
Pricing
from $2.20 / 1,000 page scrapeds
Email Extractor & Lookup API
Bulk email extractor and lookup API that scrapes emails from any website. Extract contact details from a single URL or thousands of domains. Automatically finds contact pages, verifies every email's domain, and returns structured data for lead enrichment.
Pricing
from $2.20 / 1,000 page scrapeds
Rating
0.0
(0)
Developer
Thodor
Maintained by CommunityActor stats
1
Bookmarked
2
Total users
1
Monthly active users
16 hours ago
Last modified
Categories
Share
A no-code email scraper tool that pulls business email addresses from any website. Give it a list of domains or individual pages, and it returns a clean dataset of every email it can find. It works as a bulk email extractor for prospecting, list-building, or enriching an existing CRM.
Use it as a website email scraper to extract emails from website content at scale. The Actor visits the most relevant pages on each domain (contact, about, team, legal), uncovers emails hidden by common anti-scrape tricks, verifies that each address's domain can actually receive mail, and pushes one row per unique email to the dataset.
What does Email Extractor & Lookup API do?
Email Extractor & Lookup API is an email address extractor built on the Apify platform. For every URL you give it, it:
- Visits the page while appearing as a real Chrome, Safari, or Firefox browser, so it bypasses anti-bot filters that block ordinary scrapers.
- Finds the internal links most likely to hold contact info and ranks them: contact → about → homepage → legal → everything else. This is the same prioritisation used by professional B2B email scraper pipelines.
- Acts as a polite email crawler: walks the top N pages on the same site (subdomains included), three at a time so it doesn't hammer the source.
- Pulls every email address from the page, including ones hidden behind:
- Cloudflare email protection
- HTML encoding tricks
- "name [at] domain [dot] com" obfuscation in plain text
- Filters out junk (
noreply@,webmaster@, placeholders likejohn.doe@, false matches from image filenames). - Checks each email's domain to confirm it can actually receive mail.
- Outputs one row per unique email, with how often it was seen and exactly which pages it came from.
How it works
- Paste one or more website URLs into Start URLs. URLs can be bare domains (
apify.com) or specific pages (apify.com/contact). - If you already know where the good contact info lives, paste those pages too. URLs on the same domain are grouped automatically and the Actor visits your URLs first before exploring the rest of the site.
- (Optional) Set Max pages per domain (default
20). This caps the total pages fetched per domain, counting both your supplied URLs and pages the Actor discovers. Raise it for a deeper scan of large sites (think team pages, regional offices, multi-language sections). Lower it to keep costs tight when you only care about the obvious contact pages. - Click Start. Results stream into the dataset as the Actor works, and a
page_checkedevent fires once for each page successfully retrieved (HTTP 200 with HTML content). Failed fetches, 404s, 5xxs, blocked pages, and non-HTML responses do not count toward billing.
Output example
{"email": "hello@apify.com","domain": "apify.com","valid_email_domain": true,"occurrences": 6,"urls": ["https://apify.com/contact","https://docs.apify.com/legal"]}
| Field | Meaning |
|---|---|
email | The address itself, lowercased. |
domain | The part after the @. |
valid_email_domain | true if the domain is set up to receive mail. |
occurrences | How many times this address appeared across the crawled pages. The higher the number, the more likely it's a real, publicised contact. |
urls | Every page where this email was found. |
You can download the dataset in JSON, CSV, Excel, HTML, or XML.
Hunter.io alternative
Looking for a Hunter.io alternative that lets you control exactly which sites get scanned and which pages get crawled? This Actor gives you the raw extraction layer Hunter wraps behind a credit-based API, at Apify's pay-per-platform-usage rate, with no monthly seat fees.
| Feature | Hunter.io | Email Extractor & Lookup API |
|---|---|---|
| Pricing model | Per-credit subscription | Pay-per-platform-usage on Apify |
| Domain-level email validation | Yes | Yes (always on) |
| Works on sites that block scrapers | Limited | Yes |
| Bring your own URL list | Limited | Unlimited, runs in parallel |
Per-page source tracking (urls) | No | Yes |
| Occurrence counting | No | Yes |
| Data export | API + CSV | JSON, CSV, Excel, XML, RSS, HTML |
| Mailbox-level probing | Yes | Out of scope (avoids policy-grey-area probes) |
Hunter is strong when you want a human-readable confidence score on a single domain and don't care about which page an email came from. This Actor is the right choice when you want the raw list with provenance, namely exactly which URL on a domain published which address, and to drive it from an automated pipeline.
Snov.io alternative
A Snov.io alternative comparison sits in the same place. Snov bundles email finding with cold-mail sending; this Actor is purely the extraction step, designed to feed whichever sender or CRM you already use.
| Snov.io | Email Extractor & Lookup API | |
|---|---|---|
| Email extraction from URLs | Yes | Yes |
| Built-in cold-outreach sender | Yes | No (extraction-only by design) |
| Works on sites that block scrapers | Limited | Yes |
| Self-host / open-source | No | Code is yours, runs on the Apify platform you control |
| Schedule + cron | Yes (paid plan) | Yes (Apify Schedules, included) |
| Webhook on finished run | Yes | Yes |
If you already use a sender (Instantly, lemlist, Smartlead, your own mailer) and just need a clean, auditable source of addresses, this Actor slots in without paying for sending capability you don't need.
Clay integration
Use this Actor as a Clay email finder or Clay email enrichment step:
- In your Clay table, add an HTTP API column.
- Set the request to:
- URL:
https://api.apify.com/v2/acts/thodor~apify-email-scraper-tool/run-sync-get-dataset-items?token=<APIFY_TOKEN> - Method:
POST - Body:
{ "start_urls": [{ "url": "{{Domain}}" }] }
- URL:
- Map the response into Clay columns:
email,valid_email_domain,occurrences,urls. - (Optional) Add a Clay filter for
valid_email_domain = trueto push only deliverable-domain emails into your downstream sequence.
The run-sync-get-dataset-items endpoint returns the dataset inline once the run finishes, so each Clay row gets its emails in a single request. For batched workflows (one company per row, many emails returned), use Clay's "Multiple Rows from Array" expander on the response.
API, n8n and Make usage
Apify API (the apify email scraper way)
Run a job:
curl -X POST "https://api.apify.com/v2/acts/thodor~apify-email-scraper-tool/runs?token=$APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{ "start_urls": [{ "url": "https://example.com" }] }'
Fetch the results:
$curl "https://api.apify.com/v2/acts/thodor~apify-email-scraper-tool/runs/last/dataset/items?token=$APIFY_TOKEN&format=json"
n8n (the n8n email scraper node)
n8n's official Apify node:
- Add an Apify node, select Run an Actor.
- Choose
apify-email-scraper-toolfrom your Actors. - Pass
start_urlsfrom a previous node (e.g. an HTTP node returning a list of companies). - Use the Get Dataset Items node afterwards to fan out emails one-per-line into the rest of your workflow.
Make.com
Use the Apify > Run an Actor module, then Apify > Get Dataset Items. Same pattern as n8n.
Tips for getting the most out of this email extraction tool
- If you already know where contact details live, list multiple URLs for the same domain (
example.com,example.com/team,example.com/legal). They will be grouped, visited first, and used to seed discovery of the rest of the site. - Use the
occurrencesfield to rank. An email that appears on 5+ pages is almost always a real, intended contact, while a single-occurrence email might be a one-off in a press release. - The Actor stays within the same site, so
apify.comwill follow intodocs.apify.com,status.apify.com, etc., but won't wander to unrelated sites linked from the navigation. - Increase Max pages per domain when you need a deeper scan: large team rosters, multi-region office pages, or sites where contact info is buried several clicks deep. The default
20works well for most small-to-mid sites; bump to40–100for thorough enterprise crawls. - For per-page progress signals or pay-per-page billing, configure the
page_checkedevent in the Actor's pricing settings on the Apify Console. The Actor emits one event only for pages it successfully loaded (HTTP 200 with HTML). Network errors, 4xx/5xx responses, and non-HTML content are free.
Works on more than just classic contact pages
The Actor wasn't purpose-built for any single platform, but the extraction layer is forgiving enough that it pulls contact info out of plenty of "harder" targets too:
- Social profiles with bio emails (TikTok creator pages, Instagram-style profiles): the bio email is usually rendered server-side in the page's SSR JSON or meta tags, and the Actor reads it the same way it reads a normal contact page.
- Cloudflare-protected email links (the
[email protected]placeholder you see everywhere): decoded transparently. - Pages that obfuscate emails on purpose (HTML entities, JSON Unicode escapes like
info@example.com, "info [at] domain [dot] com"): all decoded and recovered. - Schema.org / JSON-LD blocks with
"email":"..."in the structured-data script tags. - WordPress, Webflow, Wix, Squarespace, custom CMSes: no special handling, just works.
If you have a specific target where the Actor isn't finding emails it should, or you need it to behave differently (a different page-ranking strategy, support for an obfuscation scheme it doesn't know about, integration with a tool not listed above), open an issue on this Actor's Issues tab on the Apify Console. Same goes for any other feature you'd find useful. We add support case-by-case.
How emails are validated (plain English)
Validation runs automatically on every result, in two layers:
- Junk filtering. Obvious-junk addresses are dropped before they reach the dataset:
noreply@…,webmaster@…,example@…, placeholders likejohn.doe@, broken syntax, and addresses that are really filenames (logo@…png). - Domain mail-server check (
valid_email_domain). The Actor asks the email's domain whether it accepts mail at all. Every legitimate domain publishes this as a small, public DNS record. If the domain has a working mail server,valid_email_domainistrue. If it has none, mail would bounce, sovalid_email_domainisfalse.
What valid_email_domain: true does and does not mean:
- ✅ The email is syntactically correct.
- ✅ The domain is set up to receive mail.
- ❌ It does not prove that the specific mailbox (
hello@…,jane@…) exists or that someone reads it. Verifying that requires probing the mail server directly, which most providers block as abuse and which is against Apify's acceptable-use rules.
This is the same level of confidence consumer email-validation tools give without sending an actual test message.
FAQ
Is this a free email address extractor? The code is open and runs on Apify's pay-as-you-go platform, so there's no monthly subscription. You pay only for the platform usage of each run (typically cents for a few dozen domains). Apify's free tier covers casual usage.
Can I use this to get emails from website lists in bulk? Yes, that's the primary use case. Pass an array of start_urls (up to hundreds at a time) and the Actor processes each one independently. Combine with Apify Schedules to run nightly against a growing list.
Is this a good B2B email scraper for lead generation? Yes. Most B2B websites publish at least one role-based contact email (info@, sales@, support@) plus often individual team emails on /about or /team pages. Use the occurrences field to favour widely-publicised addresses, and the urls field to see whether the email came from a generic page or a relevant section.
How is this different from a traditional email harvester? Old-school email harvesters crawl indiscriminately and dump every string that looks like an address. This Actor is targeted: it ranks pages by likelihood of containing real contact info, drops obvious junk (noreply@, placeholder names, image-filename matches), deduplicates per address, counts occurrences, verifies each domain's mail server, and tells you exactly which URL each email came from. The output is a lead-ready list, not a raw dump.
Can I use it as a one-off email grabber for a single site? Yes. Paste one URL into Start URLs, hit Start, and the dataset will have everything the Actor could find for that domain, usually in under a minute.
Why are some obvious emails missed? Some sites only show contact info after a click ("Show email" button) or after a delay. This Actor reads the page directly, so anything that only appears after user interaction won't be picked up. For those sites, a browser-based scraper is required.
Why didn't you include mailbox-level verification? Probing individual mailboxes is fragile (Gmail and Outlook routinely block it), slow, and policy-grey-area. Many providers treat it as abuse. The domain-level check this Actor performs gives you the same useful signal at a fraction of the risk: "this domain can receive mail at all." If you need stronger verification, run the output through a dedicated verification service (Hunter, NeverBounce, ZeroBounce).
Is scraping public emails legal? Scraping publicly published contact information is generally permitted, but you remain responsible for following each target site's Terms of Service, robots.txt directives, and applicable privacy law (GDPR, CCPA). Do not use the output for spam. Apify's Terms forbid it.
Where do I report a bug or request a feature? Use the Issues tab on this Actor's Apify page.