🎯 Multi-Page Scanning : Automatically scans main page + up to 2 additional pages (contact/impressum/about)
🧠 JavaScript-Rendered Emails : Extracts emails from dynamically loaded content and SPA applications
🔓 Advanced Deobfuscation : Decodes all obfuscation patterns:
[at]
, (at)
, {at}
, <at>
→ @
[dot]
, (dot)
, {dot}
, <dot>
→ .
HTML entities: @
, @
, .
Multi-language patterns: [собака]
, [arroba]
, [arobase]
, [punkt]
🔍 Deep DOM Inspection :
All element attributes (href, onclick, data-*, etc.)
JavaScript event handlers (onmousedown, onmouseup)
Script tags and inline JavaScript
JSON-LD structured data
Meta tags and HTML comments
Form actions with mailto:
👁️ Shadow DOM & Iframes : Extracts emails from hidden DOM trees and embedded iframes
🎨 CSS Content Extraction : Scans ::before and ::after pseudo-elements
🤖 Intelligent Interactions :
Automatically clicks buttons to reveal hidden emails
Expands collapsed sections and accordions
Triggers hover events on contact elements
Scrolls through entire page to load lazy content
📧 Smart Email Prioritization :
Business emails first: info@, contact@, office@, hello@, support@, sales@
Name-matched emails second
Other valid emails third
🔄 Automatic Retry Mechanism : Retries email extraction once if first attempt fails
🌍 Multi-Language Contact Detection : Finds contact pages in 30+ languages
extractJavaScriptRenderedEmails()
- Extracts emails from dynamic content
extractObfuscatedEmails()
- Decodes 12+ obfuscation patterns
ultraDeepExtractEmails()
- Deep scan of all DOM elements
extractFromIframesAndShadow()
- iframe and Shadow DOM extraction
scanWebsiteForEmails()
- Main orchestration function with multi-page support
deobfuscateEmail()
- 30+ deobfuscation rules
decodeHtmlEntities()
- HTML entity decoder
isValidEmail()
- Advanced email validation with spam filtering
Email Validation & Filtering
Filters out invalid domains: facebook, sentry, cloudflare, wixpress, wordpress
Removes system emails and file extensions (.js, .css, .jpg, etc.)
Validates email format with RFC-compliant regex
Removes hex-only and numeric-only local parts
Scans up to 2 pages per website (main + 1-2 contact pages)
2-second wait time for JavaScript rendering
Automatic popup and cookie banner handling
Resource optimization for faster page loads
v2.9.X
New Features
Intelligent location parsing from search queries using parseLocationFromQuery function
Support for extracting up to 1000 results with maxResults parameter
City extraction from addresses using extractCityFromAddress function with country-specific patterns
Universal location support across all 11 countries with smart query parsing
Increased maxConcurrency from 20 to 50 for faster extraction
Added resource blocking for images, fonts, and media files via page.route
Parallel data extraction using Promise.all for listing fields
Session pool increased to 200 with maxUsageCount of 100
Technical Enhancements
Country-specific proxy configuration with residential IPs
Screenshot capture on errors saved to Actor storage
Reduced navigation timeouts to 60 seconds
Added parsedLocation to userData for location tracking
v2.8
emailExtractionMode parameter with 'standard' and 'aggressive' options
Contact page scanning when no emails found on main page
Email prioritization sorting (info@, contact@, sales@, hello@, office@, admin@)
CONTACT_PATTERNS array for finding contact-related links
Email Detection Patterns
9 different EMAIL_PATTERNS including standard, encoded, mailto, cloudflare, javascript
INVALID_EMAIL_PATTERNS to filter out images and system emails
decodeCloudflareEmail function for protected emails
decodeHtmlEntities for HTML entity conversion
v2.7
Multi-Country Configuration
SITE_CONFIGS object with 11 countries: us, ca, uk, de, fr, es, it, ro, au, br
Country-specific constructUrl functions
Language headers configuration per country
countryCode for proxy configuration
Selector Strategies
Primary selectors and fallback selectors for each field
Generic fallback selectors array in handleList
Multiple name selector strategies
Website URL validation to filter internal Yellow Pages links
v2.6
extractEmailsFromPage function with multiple extraction methods
JavaScript variable scanning in page.evaluate
JSON-LD structured data extraction
Microdata extraction with itemprop="email"
Data attribute scanning for data-email, data-contact, data-mail
v2.5
Website Handler Implementation
handleWebsite function for email extraction from business websites
Email extraction statistics with EMAILS_FOUND counter
allEmails array to store multiple emails per business
emailStatus tracking: FOUND, NOT_FOUND, NO_WEBSITE, ERROR, EXTRACTION_FAILED
v2.0
Crawlee Framework Integration
PlaywrightCrawler with router pattern
createPlaywrightRouter for request handling
Session pool configuration with useSessionPool
preNavigationHooks for browser configuration
failedRequestHandler for error recovery
v1.5
Smooth scrolling implementation for lazy-loaded content
Address extraction with fallback selectors
Phone number extraction with tel: link support
Data trimming and substring limits for each field
v1.0
Core Functionality
Basic handleList function for listing pages
Pagination handling with next page selectors
Business data structure: name, phone, address, website
Actor.pushData for result storage
RESULTS_COUNT tracking
v0.9 - PRE-RELEASE
Initial Implementation
Basic Playwright setup
Single page extraction
Simple selector-based data extraction
Console logging only
No error handling or retries