Website Email Extractor
Pricing
$5.00 / 1,000 record enricheds
Website Email Extractor
Crawls websites and extracts contact emails from contact, about, team, and agent pages.
Pricing
$5.00 / 1,000 record enricheds
Rating
0.0
(0)
Developer
Mukesh Kumar
Actor stats
1
Bookmarked
2
Total users
1
Monthly active users
39 minutes ago
Last modified
Categories
Share
Crawls websites and extracts contact emails from their contact, about, team, agents, and staff pages. Built for the middle stage of a lead-generation pipeline:
[ source scraper (Maps / Realtor / Zillow) ] → [ THIS ACTOR ] → [ email verifier ]
Handles common obfuscation patterns ([at], (dot), HTML entities, Cloudflare data-cfemail), pulls emails from mailto: links and JSON-LD structured data, and tags every email with a confidence score so downstream verification can prioritize.
Pricing
Pay per enriched record — $0.03 each.
You're charged $0.03 only when:
- You're using enrichment mode (you provided
inputJson) - The actor successfully extracts at least one email for that record
You're not charged for:
- Records you uploaded that already had emails (skipped — no work done)
- Records where the crawl found zero emails (no value delivered)
- Records with missing or invalid URLs
- Plain crawl mode runs (
startUrlswithoutinputJson) — free - Failed runs
So if you upload 100 leads and the actor finds emails for 73 of them, you pay $2.19. Compute, proxy, and storage are all included — no separate infrastructure bill.
When to use this
- You have a list of company / agent / brokerage homepages and need their public contact emails
- You want each email tagged with a confidence signal (
mailto>jsonld>cfemail>text) so downstream verification can prioritize - You want platform-domain noise (
example.com,wixpress.com,mlsgrid.com) and role accounts (noreply@) auto-filtered
When not to use this
- For URL discovery (use a Maps scraper or directory scraper first; this actor only extracts from URLs you already have)
- For email verification / SMTP validation (use an email verifier downstream)
- For LinkedIn / Facebook / Instagram (against their ToS)
Two ways to run it
A. Enrichment mode — paste any JSON, get the same JSON back with emails filled in
If you already have a list of leads (an array of objects from a Maps scraper, Realtor export, or your own CRM dump), paste the whole thing into inputJson. The actor:
- Auto-detects the URL field (
websiteUrl,website,url,homepage,site) and the email field (emails,email,contactEmail) - Skips records that already have emails
- Crawls the URL for each remaining record
- Writes extracted emails back into the same record
- Saves the enriched JSON to the Key-Value Store as
OUTPUT— downloadable as a single file
If your records have a name field (firstName, lastName, fullName, name), emails matching the name are ranked first (e.g. for "Roman Lopez", roman@firm.com is ranked above info@firm.com).
The output is your input JSON, byte-for-byte identical except the email field is populated. No extra metadata, no per-email dataset records — just your records, enriched.
Example input
{"inputJson": "[{\"fullName\":\"Roman Lopez\",\"websiteUrl\":\"https://romanlopez.com\",\"emails\":[]}]","useBrowser": true}
Example output (Key-Value Store → OUTPUT)
[{"fullName": "Roman Lopez","websiteUrl": "https://romanlopez.com","emails": ["roman@romanlopez.com"]}]
B. Plain crawl mode — give it URLs, get a flat list of emails
For one-off email extraction. Output is one Dataset record per (domain, email) pair, with full extraction metadata. This mode is free (no per-record charge applies).
Example input
{"startUrls": [{ "url": "https://romanlopez.com" },{ "url": "https://www.heydayhomes.com" }],"maxDepth": 2}
Example output (Dataset record)
{"domain": "romanlopez.com","email": "roman@romanlopez.com","confidence": "mailto","filtered": false,"filterReason": null,"sourceUrl": "https://romanlopez.com/contact","depth": 1,"foundAt": "2026-05-05T12:34:56.789Z"}
Input fields
| Field | Type | Default | Notes |
|---|---|---|---|
inputJson | string (JSON) | — | Paste a JSON array or single object. Triggers enrichment mode. |
urlField | string | auto-detect | Override which key holds the URL in your records |
emailField | string | auto-detect | Override which key receives the extracted emails |
startUrls | array | — | URLs to crawl (used only when inputJson is empty) |
maxDepth | int | 2 | Depth-0 = start URL only. Depth-1+ pages are filtered by targetPaths |
maxPagesPerDomain | int | 30 | Hard cap per hostname |
maxRequestsPerCrawl | int | 5000 | Global safety cap |
targetPaths | string[] | ["contact","about","team","agents","staff","meet"] | URL substrings that gate deep crawl |
useBrowser | bool | true | Playwright (handles JS-rendered sites) or Cheerio (3–20× faster, server-rendered only) |
includeFiltered | bool | false | Emit role accounts / platform domains with a filterReason instead of dropping |
maxEmailsPerRecord | int | 5 | Cap on emails written to each record in enrichment mode |
Output reference
Enrichment mode → Key-Value Store key OUTPUT
The same JSON you pasted in inputJson, byte-identical except the detected email field has been populated. Same shape (array stays array, single object stays single object), same field order, same key names. One file, one click to download.
Plain crawl mode → Dataset
One record per unique (domain, email) pair:
| Field | Values |
|---|---|
domain | Hostname of the source page (with www. stripped) |
email | The extracted email, lowercased |
confidence | mailto (explicit mailto: link, highest signal) · jsonld (extracted from application/ld+json Person/Org markup) · cfemail (decoded from Cloudflare data-cfemail obfuscation) · text (regex match on visible text, lowest signal) |
filtered | true if flagged as noise — only present when includeFiltered: true |
filterReason | role_account · platform_domain · url_false_positive · file_extension_tld · invalid_format (null when filtered: false) |
sourceUrl | Page URL where the email was found |
depth | 0 for the start URL, 1+ for followed links |
foundAt | ISO-8601 timestamp |
Compliance
- Output is intended for users with a lawful basis to contact the extracted addresses (B2B outreach, opt-in lists, etc.)
- The actor honors
robots.txt(respectRobotsTxtFile: true) — sites that disallow scraping are skipped, not crawled - Emails are not persisted server-side beyond the run lifetime — the OUTPUT and Dataset are scoped to your run
- Do not point this at sites whose ToS forbids automated access, or at LinkedIn / Facebook / Instagram (separate ToS, separate concern)
Limitations
- Sites behind Cloudflare bot-detection (actual JS challenges, not just IP filtering) may return 403 even via residential proxy. Bot-fingerprint bypassing is out of scope for this actor.
- SPA contact forms with no static email content will return zero emails (the email is loaded dynamically after user interaction). You're not charged for these.
- Pages disallowed by
robots.txtare skipped by design.