Websites Html Email Scraper
Pricing
from $12.00 / 1,000 results
Websites Html Email Scraper
Extract email addresses from any website at scale. Crawls multiple pages per domain, deduplicates results, filters false positives, and exports a clean dataset ready for outreach. Just Provide list of urls and let it work for you!
Pricing
from $12.00 / 1,000 results
Rating
0.0
(0)
Developer
Ben salem yosri
Actor stats
0
Bookmarked
2
Total users
0
Monthly active users
17 days ago
Last modified
Categories
Share
📧 Email Scraper
Extract email addresses from any website — automatically, at scale.
This actor crawls websites page by page and pulls every email address it can find: from mailto: links, visible text, meta tags, HTML attributes, and inline JavaScript. Give it a list of URLs and it returns a clean, deduplicated dataset of emails ready to export.
🚀 What it does
Starting from your list of URLs, the actor:
- Visits each website and scans the page for email addresses
- Follows internal links to crawl deeper (up to your configured limit)
- Deduplicates emails so you never see the same address twice
- Records exactly which page each email was found on
- Saves everything to a structured dataset you can export as CSV, JSON, or XLSX
It automatically skips social media platforms, review sites, and other domains that never contain useful contact emails (Amazon, LinkedIn, Yelp, etc.) — saving you time and credits.
📥 Input
| Field | Type | Default | Description |
|---|---|---|---|
startUrls | Array | — | Required. The websites to scrape |
maxPagesPerSite | Integer | 10 | How many pages to crawl per domain |
maxConcurrency | Integer | 5 | Parallel requests (higher = faster, more resource use) |
skipDomains | Array | (see below) | Domain keywords to skip entirely |
proxyConfiguration | Object | disabled | Use Apify Proxy to avoid blocks |
Domains skipped by default:
amazon, yelp, facebook, instagram, reddit, twitter, linkedin, youtube, tiktok, pinterest, snapchat, google, apple, microsoft, wikipedia, tripadvisor, bbb, yellowpages, maps, bing, yahoo, trustpilot, glassdoor, indeed
You can override this list entirely in the input if needed.
📤 Output
Each email found is saved as a row in the dataset:
{"sourceUrl": "https://acme.com","rootDomain": "acme.com","email": "hello@acme.com","foundOnPage": "https://acme.com/contact","scrapedAt": "2025-06-01T09:15:00.000Z"}
Export the full results from the Apify Console in CSV, JSON, XLSX, or XML with one click.
💡 Example input
{"startUrls": [{ "url": "https://company-a.com" },{ "url": "https://company-b.io" },{ "url": "https://agency-c.co.uk" }],"maxPagesPerSite": 20,"maxConcurrency": 5,"proxyConfiguration": { "useApifyProxy": true }}
🔍 Where emails are extracted from
The actor scans all of the following on every page:
mailto:links (most reliable source)- All visible text content
- Meta tag
contentattributes - HTML
data-*attributes and form field values - Inline
<script>blocks (emails stored in JS variables)
False positives (placeholder emails like user@example.com, image filenames, etc.) are automatically filtered out.
⚙️ Tips for best results
- Set
maxPagesPerSiteto 20–50 for thorough coverage of larger sites. Contact, About, and Team pages are crawled automatically. - Enable Apify Proxy if you're scraping sites with bot protection.
- Lower
maxConcurrencyif you're hitting rate limits on sensitive domains. - The actor respects site structure and only follows internal links — it won't wander off to third-party domains.
🛠️ Local development
npm install# Create input filemkdir -p storage/key_value_stores/defaultecho '{"startUrls":[{"url":"https://yoursite.com"}],"maxPagesPerSite":5}' \> storage/key_value_stores/default/INPUT.jsonnpm start
📄 License
MIT
