Automae Email Extractor
Pricing
Pay per event
Go to Apify Store

Automae Email Extractor
An advanced Apify crawler to automatically extract email addresses from any website, with anti-detection protection and Cloudflare decoding.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Theo Jim
Maintained by Community
Actor stats
0
Bookmarked
6
Total users
3
Monthly active users
17 days ago
Last modified
Categories
Share
π΅οΈββοΈ Intelligent Email Extractor
An advanced Apify crawler to automatically extract email addresses from any website, with anti-detection protection and Cloudflare decoding.
β¨ Features
π Multi-Source Extraction
- Mailto links : Direct extraction from
mailto:links - Cloudflare protection : Automatic decoding of
data-cfemailemails - Smart regex : Email detection in HTML content
- Metadata : Extraction from
<meta>tags
π― Intelligent Navigation
- Contact pages : Automatic detection of contact pages via keywords
- Multilingual keywords : French, English, German support
- Smart scoring : Prioritization of most relevant pages
- Configurable limits : Control the number of pages to analyze
π‘οΈ Anti-Detection Protection
- Fingerprinting : Realistic browser fingerprint
- Random delays : Avoids detectable navigation patterns
- Human headers : Realistic User-Agent and headers
- Session management : Session pool to avoid bans
π§ Filtering and Prioritization
- Whitelist : Priority emails (contact@, hello@, info@, etc.)
- Blacklist : Filter unwanted emails (no-reply@, etc.)
- Validation : Email validity verification
- Deduplication : Automatic duplicate removal
π Installation
# Clone the projectgit clone <repository-url>cd apify-get-emails-from-site# Install dependenciesnpm install# Install Playwright (automatic via postinstall)npx playwright install --with-deps chromium
π Usage
Input Configuration
{"baseUrl": "https://example.com","maxContactPages": 2,"navigationTimeoutMs": 30000}
Parameters:
baseUrl(required) : Base URL to analyzemaxContactPages(optional, default: 2) : Maximum number of contact pages to analyzenavigationTimeoutMs(optional, default: 30000) : Navigation timeout in ms
Execution
# Local executionnpm start# Or directlynode main.js
Output
{"hit": true,"primaryEmail": "contact@example.com","emails": ["contact@example.com", "info@example.com"],"sourceUrl": "https://example.com/contact","scanned": ["https://example.com", "https://example.com/contact"],"baseUrl": "https://example.com"}
π§ Advanced Configuration
Contact Keywords
The crawler automatically detects contact pages via these keywords:
const contactKeywords = ["contact", "contact-us", "contactez", "kontakt", "mentions","mentions-legales", "legal", "imprint", "impressum", "privacy","privacy-policy", "confidentialite", "support", "help", "aide","about", "a-propos", "team", "equipe", "cgu", "cgv", "faq"];
Priority Emails
const roleWhitelist = ["contact@", "hello@", "info@", "support@", "sales@","partners@", "partnership@", "team@", "hi@", "help@"];
Filtered Emails
const roleBlacklist = ["no-reply@", "noreply@", "donotreply@"];
ποΈ Architecture
Execution Flow
- Initialization : Crawler and queue configuration
- Navigation : Main page loading
- Extraction : Multi-source email analysis
- Decision : If emails found β stop, else β contact pages
- Result : Return prioritized emails
Anti-Detection Protection
- Limited concurrency : 1 tab at a time
- Random delays : 300-800ms before navigation, 200-600ms after
- Fingerprinting : Unique browser fingerprint
- Realistic headers : Recent Chrome User-Agent
- Error handling : Automatic retry on failures
π Performance
- Configurable timeout : Avoids blocking
- Request limiting : Server load control
- Smart stop : Stop as soon as a valid email is found
- Deduplication : Avoids redundant analysis
π οΈ Dependencies
- apify (^3.5.0) : Scraping framework
- crawlee (^3.15.1) : Crawling library
- playwright (^1.46.0) : Browser automation
π Technical Notes
- ES6 modules : Modern imports/exports usage
- Error handling : Try/catch for robustness
- Email validation : Regex and domain verification
- Relative URLs : Automatic link resolution
π€ Contributing
Contributions are welcome! Feel free to:
- Report bugs
- Suggest improvements
- Add new contact keywords
- Optimize performance
π License
This project is under MIT license.


