Websites Html Email Scraper avatar

Websites Html Email Scraper

Pricing

from $12.00 / 1,000 results

Go to Apify Store
Websites Html Email Scraper

Websites Html Email Scraper

Extract email addresses from any website at scale. Crawls multiple pages per domain, deduplicates results, filters false positives, and exports a clean dataset ready for outreach. Just Provide list of urls and let it work for you!

Pricing

from $12.00 / 1,000 results

Rating

0.0

(0)

Developer

Ben salem yosri

Ben salem yosri

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

17 days ago

Last modified

Share

📧 Email Scraper

Extract email addresses from any website — automatically, at scale.

This actor crawls websites page by page and pulls every email address it can find: from mailto: links, visible text, meta tags, HTML attributes, and inline JavaScript. Give it a list of URLs and it returns a clean, deduplicated dataset of emails ready to export.


🚀 What it does

Starting from your list of URLs, the actor:

  1. Visits each website and scans the page for email addresses
  2. Follows internal links to crawl deeper (up to your configured limit)
  3. Deduplicates emails so you never see the same address twice
  4. Records exactly which page each email was found on
  5. Saves everything to a structured dataset you can export as CSV, JSON, or XLSX

It automatically skips social media platforms, review sites, and other domains that never contain useful contact emails (Amazon, LinkedIn, Yelp, etc.) — saving you time and credits.


📥 Input

FieldTypeDefaultDescription
startUrlsArrayRequired. The websites to scrape
maxPagesPerSiteInteger10How many pages to crawl per domain
maxConcurrencyInteger5Parallel requests (higher = faster, more resource use)
skipDomainsArray(see below)Domain keywords to skip entirely
proxyConfigurationObjectdisabledUse Apify Proxy to avoid blocks

Domains skipped by default: amazon, yelp, facebook, instagram, reddit, twitter, linkedin, youtube, tiktok, pinterest, snapchat, google, apple, microsoft, wikipedia, tripadvisor, bbb, yellowpages, maps, bing, yahoo, trustpilot, glassdoor, indeed

You can override this list entirely in the input if needed.


📤 Output

Each email found is saved as a row in the dataset:

{
"sourceUrl": "https://acme.com",
"rootDomain": "acme.com",
"email": "hello@acme.com",
"foundOnPage": "https://acme.com/contact",
"scrapedAt": "2025-06-01T09:15:00.000Z"
}

Export the full results from the Apify Console in CSV, JSON, XLSX, or XML with one click.


💡 Example input

{
"startUrls": [
{ "url": "https://company-a.com" },
{ "url": "https://company-b.io" },
{ "url": "https://agency-c.co.uk" }
],
"maxPagesPerSite": 20,
"maxConcurrency": 5,
"proxyConfiguration": { "useApifyProxy": true }
}

🔍 Where emails are extracted from

The actor scans all of the following on every page:

  • mailto: links (most reliable source)
  • All visible text content
  • Meta tag content attributes
  • HTML data-* attributes and form field values
  • Inline <script> blocks (emails stored in JS variables)

False positives (placeholder emails like user@example.com, image filenames, etc.) are automatically filtered out.


⚙️ Tips for best results

  • Set maxPagesPerSite to 20–50 for thorough coverage of larger sites. Contact, About, and Team pages are crawled automatically.
  • Enable Apify Proxy if you're scraping sites with bot protection.
  • Lower maxConcurrency if you're hitting rate limits on sensitive domains.
  • The actor respects site structure and only follows internal links — it won't wander off to third-party domains.

🛠️ Local development

npm install
# Create input file
mkdir -p storage/key_value_stores/default
echo '{"startUrls":[{"url":"https://yoursite.com"}],"maxPagesPerSite":5}' \
> storage/key_value_stores/default/INPUT.json
npm start

📄 License

MIT