[3.2.X]
Fixed
Critical performance bug: getAttribute() calls missing timeout parameters (causing 30s default waits)
Performance improvement: 99.7% faster listing processing (8min → 1.4s per listing)
Total extraction time reduced from 4 hours to ~42 seconds for 30 listings
Changed
Added explicit 100ms timeout to all getAttribute() calls for href extraction
Added 50ms timeout to email element getAttribute() calls
Reduced request handler timeout from 300s to 90s
Limited email scans to maximum 5 per page (prevents excessive scanning)
Reduced email extraction timeout from 45s to 20s
Reduced page scan limit from 2 to 1 page per website
Increased concurrency from 2-5 to 3-8 for better parallelization
Optimized page wait times (1000ms→800ms, 1500ms→1200ms)
Reduced click delay from 300ms to 200ms
Improved
Email/website scanning balanced for performance and discovery
Faster failure detection with reduced timeouts
Better overall throughput with increased parallelization
[3.1.0]
Added
Form-based search system (navigates to homepage, fills forms like real user)
Search form selector system for all 10 countries
Trust score system (0-100) with dynamic tracking
Trust-based rate limiting (1000-5000ms delays)
Data quality metrics system (0-100 scoring)
Geo-matching validation with confidence scoring
Auto language detection for emails (18 supported TLDs)
Email quality tracking per domain
Multi-language consent popup handling (15 languages)
Changed
Architecture: Removed direct URL construction, added form-based navigation
Main entry point changed from constructed URL to homepage
Selectors updated with form fields (keywordInput, locationInput, submitButton)
Input schema: scanForEmails moved to first position, default changed to false
maxResults default changed from 1000 to 2
Rate limiting upgraded from static 1000ms to dynamic trust-based delays
Reduced concurrency from 5 to 2 (less aggressive)
Increased retry attempts from 3 to 5
Logging level changed from INFO to WARNING
Fixed
Bot detection prevention with human-like form filling
Search flexibility for location-less queries
Proxy country override issue (removed US fallback)
Session rotation causing 403 blocks
Memory leaks from insufficient garbage collection
Statistics log spam
[3.0.0]
Added
Advanced email extraction engine with multi-layer scanning
Multi-language contact page detection (60+ languages, 40+ countries)
Intelligent email scoring and prioritization system
Domain-to-website matching validation
Email caching layer for performance
Rate limiting and request statistics tracking
Modular architecture: email-extractor.js, validators.js, keywords.js, config.js, cache-utils.js
Deep DOM inspection (attributes, event handlers, scripts, JSON-LD, meta tags)
Shadow DOM and iframe email extraction
CSS pseudo-element content scanning
Comprehensive email deobfuscation (100+ patterns)
Changed
Performance: Increased concurrent processing from 20 to 50 businesses
Extended session pool to 200 with higher usage limits
Page-level context isolation for better memory management
Garbage collection triggers every 3 pages
Centralized configuration management
Enhanced error handling with graceful degradation
Fixed
Dataset output stopping at field 09 (now shows all 12 fields)
Missing 12_sourceUrl in output views
Email extraction from obfuscated contact information
Missing emails from JavaScript-heavy websites
Contact page detection across different languages
[2.9.0]
Added
Intelligent location parsing from search queries
Support for up to 1000 results per run
City extraction from addresses with country-specific patterns
Universal location support across all countries
Changed
Reduced navigation timeouts to 60 seconds
Improved parallel data extraction
Enhanced session management
[2.8.0]
Added
Email extraction mode parameter
Contact page scanning when no emails found
Email prioritization sorting
Multiple email detection patterns (9 patterns)
Cloudflare email protection decoding
[2.7.0]
Added
Multi-country support (10 countries total)
Country-specific URL construction
Language headers per country
Country-specific proxy settings
Changed
Selector strategies with primary and fallback options
Website URL validation to filter internal links
[2.0.0] - Previous Releases
Historical releases preserved for reference. Major milestones:
v2.6.0: Advanced email extraction from JavaScript, JSON-LD, microdata
v2.5.0: Website handler for email extraction, statistics tracking
v2.0.0: Crawlee framework integration, PlaywrightCrawler, session pools
v1.5.0: Smooth scrolling, address extraction, phone number links
v1.0.0: Core listing handler, pagination, basic extraction
v0.9.0: Initial Playwright setup, basic single page extraction