Website Email Extractor avatar

Website Email Extractor

Pricing

$5.00 / 1,000 record enricheds

Go to Apify Store
Website Email Extractor

Website Email Extractor

Crawls websites and extracts contact emails from contact, about, team, and agent pages.

Pricing

$5.00 / 1,000 record enricheds

Rating

0.0

(0)

Developer

Mukesh Kumar

Mukesh Kumar

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

39 minutes ago

Last modified

Share

Crawls websites and extracts contact emails from their contact, about, team, agents, and staff pages. Built for the middle stage of a lead-generation pipeline:

[ source scraper (Maps / Realtor / Zillow) ][ THIS ACTOR ][ email verifier ]

Handles common obfuscation patterns ([at], (dot), HTML entities, Cloudflare data-cfemail), pulls emails from mailto: links and JSON-LD structured data, and tags every email with a confidence score so downstream verification can prioritize.

Pricing

Pay per enriched record — $0.03 each.

You're charged $0.03 only when:

  • You're using enrichment mode (you provided inputJson)
  • The actor successfully extracts at least one email for that record

You're not charged for:

  • Records you uploaded that already had emails (skipped — no work done)
  • Records where the crawl found zero emails (no value delivered)
  • Records with missing or invalid URLs
  • Plain crawl mode runs (startUrls without inputJson) — free
  • Failed runs

So if you upload 100 leads and the actor finds emails for 73 of them, you pay $2.19. Compute, proxy, and storage are all included — no separate infrastructure bill.

When to use this

  • You have a list of company / agent / brokerage homepages and need their public contact emails
  • You want each email tagged with a confidence signal (mailto > jsonld > cfemail > text) so downstream verification can prioritize
  • You want platform-domain noise (example.com, wixpress.com, mlsgrid.com) and role accounts (noreply@) auto-filtered

When not to use this

  • For URL discovery (use a Maps scraper or directory scraper first; this actor only extracts from URLs you already have)
  • For email verification / SMTP validation (use an email verifier downstream)
  • For LinkedIn / Facebook / Instagram (against their ToS)

Two ways to run it

A. Enrichment mode — paste any JSON, get the same JSON back with emails filled in

If you already have a list of leads (an array of objects from a Maps scraper, Realtor export, or your own CRM dump), paste the whole thing into inputJson. The actor:

  1. Auto-detects the URL field (websiteUrl, website, url, homepage, site) and the email field (emails, email, contactEmail)
  2. Skips records that already have emails
  3. Crawls the URL for each remaining record
  4. Writes extracted emails back into the same record
  5. Saves the enriched JSON to the Key-Value Store as OUTPUT — downloadable as a single file

If your records have a name field (firstName, lastName, fullName, name), emails matching the name are ranked first (e.g. for "Roman Lopez", roman@firm.com is ranked above info@firm.com).

The output is your input JSON, byte-for-byte identical except the email field is populated. No extra metadata, no per-email dataset records — just your records, enriched.

Example input

{
"inputJson": "[{\"fullName\":\"Roman Lopez\",\"websiteUrl\":\"https://romanlopez.com\",\"emails\":[]}]",
"useBrowser": true
}

Example output (Key-Value Store → OUTPUT)

[
{
"fullName": "Roman Lopez",
"websiteUrl": "https://romanlopez.com",
"emails": ["roman@romanlopez.com"]
}
]

B. Plain crawl mode — give it URLs, get a flat list of emails

For one-off email extraction. Output is one Dataset record per (domain, email) pair, with full extraction metadata. This mode is free (no per-record charge applies).

Example input

{
"startUrls": [
{ "url": "https://romanlopez.com" },
{ "url": "https://www.heydayhomes.com" }
],
"maxDepth": 2
}

Example output (Dataset record)

{
"domain": "romanlopez.com",
"email": "roman@romanlopez.com",
"confidence": "mailto",
"filtered": false,
"filterReason": null,
"sourceUrl": "https://romanlopez.com/contact",
"depth": 1,
"foundAt": "2026-05-05T12:34:56.789Z"
}

Input fields

FieldTypeDefaultNotes
inputJsonstring (JSON)Paste a JSON array or single object. Triggers enrichment mode.
urlFieldstringauto-detectOverride which key holds the URL in your records
emailFieldstringauto-detectOverride which key receives the extracted emails
startUrlsarrayURLs to crawl (used only when inputJson is empty)
maxDepthint2Depth-0 = start URL only. Depth-1+ pages are filtered by targetPaths
maxPagesPerDomainint30Hard cap per hostname
maxRequestsPerCrawlint5000Global safety cap
targetPathsstring[]["contact","about","team","agents","staff","meet"]URL substrings that gate deep crawl
useBrowserbooltruePlaywright (handles JS-rendered sites) or Cheerio (3–20× faster, server-rendered only)
includeFilteredboolfalseEmit role accounts / platform domains with a filterReason instead of dropping
maxEmailsPerRecordint5Cap on emails written to each record in enrichment mode

Output reference

Enrichment mode → Key-Value Store key OUTPUT

The same JSON you pasted in inputJson, byte-identical except the detected email field has been populated. Same shape (array stays array, single object stays single object), same field order, same key names. One file, one click to download.

Plain crawl mode → Dataset

One record per unique (domain, email) pair:

FieldValues
domainHostname of the source page (with www. stripped)
emailThe extracted email, lowercased
confidencemailto (explicit mailto: link, highest signal) · jsonld (extracted from application/ld+json Person/Org markup) · cfemail (decoded from Cloudflare data-cfemail obfuscation) · text (regex match on visible text, lowest signal)
filteredtrue if flagged as noise — only present when includeFiltered: true
filterReasonrole_account · platform_domain · url_false_positive · file_extension_tld · invalid_format (null when filtered: false)
sourceUrlPage URL where the email was found
depth0 for the start URL, 1+ for followed links
foundAtISO-8601 timestamp

Compliance

  • Output is intended for users with a lawful basis to contact the extracted addresses (B2B outreach, opt-in lists, etc.)
  • The actor honors robots.txt (respectRobotsTxtFile: true) — sites that disallow scraping are skipped, not crawled
  • Emails are not persisted server-side beyond the run lifetime — the OUTPUT and Dataset are scoped to your run
  • Do not point this at sites whose ToS forbids automated access, or at LinkedIn / Facebook / Instagram (separate ToS, separate concern)

Limitations

  • Sites behind Cloudflare bot-detection (actual JS challenges, not just IP filtering) may return 403 even via residential proxy. Bot-fingerprint bypassing is out of scope for this actor.
  • SPA contact forms with no static email content will return zero emails (the email is loaded dynamically after user interaction). You're not charged for these.
  • Pages disallowed by robots.txt are skipped by design.