Website Mail Extractor avatar

Website Mail Extractor

Pricing

from $3.00 / 1,000 email enrichments

Go to Apify Store
Website Mail Extractor

Website Mail Extractor

Website Email Scraper is a powerful, lightweight, and stealthy web crawler designed to find and extract public email addresses from any website. Simply provide a list of starting URLs, and the scraper will follow internal links, prioritize key pages, and return a clean list of deduplicated emails.

Pricing

from $3.00 / 1,000 email enrichments

Rating

0.0

(0)

Developer

mikolabs

mikolabs

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

3 days ago

Last modified

Share

Website Email Scraper

Website Email Scraper is a powerful, lightweight, and stealthy web crawler designed to find and extract public email addresses from any website. Simply provide a list of starting URLs, and the scraper will follow internal links, prioritize key pages, and return a clean list of deduplicated emails.

βœ… Key Features

  • SPA-Ready (React, Vue, Angular): Scrapes modern Single Page Applications (SPAs) by scanning local JavaScript script assets for hidden contact emails.
  • Sitemap XML Support: Automatically discovers and parses the website's sitemap.xml for lightning-fast page indexing.
  • Stealth Mode & Anti-Blocking: Built-in concurrency control, randomized user-agents, and Apify Proxy support to keep your scraper undetected.
  • Smart Priority Routing: Targets high-value contact pages (like /contact, /about, /team) first to save crawl time.
  • Blog & Media Filters: Automatically skips irrelevant blog directories and large media assets to maximize efficiency and control costs.
  • Deduplication: Automatically dedupes email addresses across the entire run to prevent duplicate charges and messy datasets.

πŸ† Benefits

  • Effortless Lead Generation: Turn a list of company websites into verified prospect contact details in seconds.
  • High Deliverability: Every email includes the exact source URL where it was found, allowing easy QA and highly personalized outreach.
  • Unbeatable Cost Control: Because it uses direct HTTP crawling instead of heavy browser instances, compute unit consumption is extremely minimal, costing under $0.05 per 1,000 crawled pages.

πŸ’³ Pricing

This Actor uses the Pay-per-event pricing model. You are billed only for successful results:

  • $3.00 USD per 1,000 unique emails ($0.003 USD per email).
  • Deduplication is run-wide; you are never charged twice for the same email address in a single run.
  • If a run extracts 0 emails, you pay nothing.

πŸš€ Quick start

  1. Go to the Input tab in the Apify Console.
  2. Enter one or more website URLs in the Seed URLs field.
  3. Configure the optional boundaries (like Max emails to scrape or Max pages per seed).
  4. Select your Proxy configuration (Apify US/Residential proxy is recommended).
  5. Click Start to run the scraper and download your data in JSON, CSV, Excel, or HTML format once the run completes.

βš™οΈ How it works

Under the hood, the scraper performs the following steps:

  1. Sitemap Discovery: First, it attempts to fetch the website's XML sitemap to immediately identify all key pages.
  2. Page Crawling: Starting from your seed URLs, it crawls internal pages up to your configured crawl depth.
  3. Keyword Prioritizing: It scores page paths using contact-related keywords so that pages containing /contact, /about, or /team are visited first.
  4. Email Extraction: It searches for plain-text email addresses, mailto: anchor tags, HTML-obfuscated entities, and scans local JS resource bundles to locate emails in dynamic components.
  5. Deduplication & Output: Found emails are normalized, cleaned of CDNs/spam domains/fake placeholders, deduplicated, and appended to the dataset with their corresponding seed and discovery URLs.

πŸ“Š Output

All scraped emails are saved in a structured dataset, available for download in CSV, JSON, XML, or Excel.

Example JSON output item:

{
"email": "contact@example.com",
"pageUrl": "https://example.com/about-us",
"seedUrl": "https://example.com"
}

Output Fields:

Field NameTypeDescription
emailStringThe normalized and cleaned email address.
pageUrlStringThe exact webpage URL where this email address was discovered.
seedUrlStringThe initial seed URL entered in the input from which the crawl started.

πŸ› οΈ Input Parameters

You can configure the scraper with the following fields:

Field NameTypeDescriptionDefault / Empty Behavior
urlsArray of ObjectsRequired. List of website URLs to crawl (e.g. [{"url": "https://example.com"}]).Must contain at least 1 URL.
deepSearchBooleanOptional. If enabled, crawls deep into the website structure. If disabled, only crawls the main page and high-priority pages (e.g. contact, about, team).Default is false (highly cost-effective).
maxEmailsToScrapeIntegerOptional. Stop the run after finding this many unique emails across all seeds.0 or empty crawls until finished.
maxCrawlDepthIntegerOptional. Link-hops from the seed URL. Depth 1 crawls only the seed page.Default is 3. Max is 6.
maxPagesPerSeedIntegerOptional. Maximum number of pages the scraper will visit per seed URL to control costs.Default is 50. Max is 1000.
proxyConfigurationObjectOptional. Select proxies to bypass blocking. US/Residential proxies are recommended.Enabled by default.

Example Input configuration:

{
"urls": [
{ "url": "https://example.com" }
],
"maxEmailsToScrape": 20,
"maxCrawlDepth": 2,
"maxPagesPerSeed": 30,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": [
"RESIDENTIAL"
]
}
}

πŸ” Error Handling & Ignored Resources

  • Skipping Bad URLs: Malformed URL entries are automatically detected, logged as warnings, and skipped so they do not break the crawl.
  • Resource Filtering: Non-text files (such as .zip, .pdf, .mp4, .png, .jpg, etc.) are automatically ignored to save bandwidth and compute units.
  • CDNs & Noise Filtering: Extracted emails are checked against blacklists to remove fake/placeholder emails (like you@example.com or info@example.com) and tracking domains (like sentry.io or google-analytics.com).
  • Resilient Runs: HTTP 403 or 404 pages are skipped gracefully without aborting the crawler.

πŸ“ Release Notes

  • v1.0.0: Initial release. Built-in support for XML sitemaps, SPA script scanning, crawler caps, and custom blacklists.

πŸ†˜ FAQ & Support

  • Is it legal to scrape email addresses? Scraping public emails for contact directories and lead indexing is generally legal. However, always ensure you comply with regional regulations such as CAN-SPAM (US) or GDPR (EU) when conducting cold outreach.
  • Can it parse JS obfuscation? Yes, the scraper automatically decodes standard obfuscation schemes (HTML entities, JS strings, and atob schemes) by scanning page scripts.
  • Support: If you run into issues, have questions, or want to suggest new features, please file a ticket in the Issues tab.