[3.2.22]
Fixed
Brazilian site (guiamais.com.br) form visibility timeout issue
Enhanced BR site selectors with 14 keyword input options, 12 location input options, and 11 submit button options
Added searchTriggers configuration for BR site to handle hidden/dynamic search forms
Added handleBrazilianSite() function in main.js for special BR form handling
Automatic fallback to direct URL search when BR form is not found
[3.2.0]
Added
Automatic cache cleanup mechanism
validateConfig() function in src/config.js
getRandomDelay() validation
Comprehensive JSDoc type definitions across all modules
Input validation to getKeywordsForLanguages() and getCountryLanguages()
11 missing countries to getCountryLanguages()
9 missing contact page paths
Deduplication using Set in keywords.js
Input validation to parseLocationFromQuery() and getKeywordsForSite()
ReDoS safety documentation in validators.js
Division by zero protection in cache-utils.js
evictOldestEntry() helper function
resetGlobalMonitor() export in memory-monitor.js
v8 heap statistics for accurate percentage calculation
Circular buffer for measurements in memory-monitor.js
Configurable logger option in memory-monitor.js and retry-manager.js
Note about dead code status in retry-manager.js
safeGarbageCollect() helper in email-extractor.js
isContextDestroyedError() helper in email-extractor.js
Click safeguards in email-extractor.js
atomicIncrement() helper in routes.js
extractWithSelectors() helper in routes.js
extractPhone() helper in routes.js
validateUrl() helper in routes.js
FIELDS constant object in routes.js
MAX_LENGTHS constant object in routes.js
maybeGC() helper in routes.js
Screenshot rate limiting in routes.js
takeScreenshotWithLimits() function in main.js
incrementRequestCounter() in main.js
validateMaxResults() function in main.js
resolveSearchFormSelectors() helper in main.js
Changed
Brazil site enumTitle to GuiaMais.com.br
Standardized all configuration properties to camelCase
Renamed PUBLIC_EMAIL_BONUS to PUBLIC_EMAIL_PENALTY for clarity
User agent update documentation added
Consolidated obfuscated email regex patterns
Country parameter support added to keywords.js
Expanded keyword lists for underrepresented languages
Made UK listing selector more specific
Made FR listing selector more specific
Updated import to PUBLIC_EMAIL_PENALTY in validators.js
Sorted TLDs by length for proper matching
Consolidated duplicate name matching logic
Extracted magic numbers to named constants across all modules
Made return values consistent (null vs undefined) in cache-utils.js
lastUpdated timestamp added to metrics
Threshold validation added in memory-monitor.js constructor
Consolidated data requirement checks in memory-monitor.js
HTTP status codes extracted to constants in retry-manager.js
getAllCircuitBreakers() returns deep copy
Replaced isTimedOut flag with AbortController in email-extractor.js
Replaced deprecated waitForTimeout with delay()
Standardized logging levels across modules
Removed deprecated createTreeWalker parameter
Optimized DOM queries with caching
seenBusinesses size limit added in routes.js
Session and crawler parameters added to handleWebsite
Router wrapping pattern fixed in main.js
Replaced console.log with crawlee log
Removed conflicting browser args
Fixed
Critical getAttribute() timeout bug causing 30-second default waits
Critical scanPage() function that was never being called (400+ lines of dead code now activated)
Race conditions with atomic increment helpers
Division by zero errors in memory monitoring
Version to semver format (3.2.0) in .actor/actor.json
File paths for readme/changelog/dockerfile in .actor/actor.json
Brazil enumTitle inconsistency in .actor/input_schema.json
actorSpecification changed to actorDatasetSchemaVersion in .actor/dataset_schema.json
Premature npm ls command removed from Dockerfile
Added --ignore-scripts to npm install in Dockerfile
Removed unused ADJUSTMENT_INTERVAL from config.js
getDomainFromUrl() to return null on error
getKeywordsForLanguages() to populate contactPaths
Missing imports from keywords.js in selectors.js (CRITICAL)
Missing space in ternary operator in selectors.js
Duplicate 'en' removed from locationPrepositions
Duplicate phone selector removed in AU config
Null checks before email.split() operations in validators.js (CRITICAL)
TLD check to use endsWith() instead of includes()
Early return for invalid emails
Public domain check logic
CACHE_CONFIG.maxSize to maxCacheSize in cache-utils.js (CRITICAL)
Removed unused Actor and getDomainFromUrl imports from cache-utils.js
Cleanup for domainQualityMetrics added
Division by zero in detectLeaks() in memory-monitor.js (CRITICAL)
Config warning for singleton pattern added
Slice sizes in getTrend()
Wrong max retries in while loop in retry-manager.js (CRITICAL)
Off-by-one error in retry count
HALF_OPEN state transition bug
Null check for lastFailureTime added
Domain parameter validation added
Function parameter validation added
Jitter calculation for negative delays
Null check for browser in email-extractor.js (CRITICAL)
globalTimeout wrapped in try-finally
ReDoS vulnerability documentation added
Overly aggressive \bat\b replacement
Memory leak by clearing Sets in finally block
Debug logging added to empty catch blocks
Keywords parameter validation added
Try-catch around context creation added
Variable shadowing fixed
Assignment in while condition fixed
Unused imports removed from routes.js (CRITICAL)
RESULTS_COUNT increment added in handleWebsite (CRITICAL)
Error logging added to silent catch blocks
Null check for businessData fields added
Hardcoded timeouts replaced with TIMING_CONFIG
userData validation added
Missing space in conditional in main.js (CRITICAL)
Actor.init() wrapped in try-catch (CRITICAL)
Unused crawleeLog import removed
CONCURRENCY_CONFIG property references fixed
Debug logging added to empty catch blocks
TIMING_CONFIG property names in email-extractor.js (camelCase)
CONCURRENCY_CONFIG property names in cache-utils.js (camelCase)
[3.1.1]
Added
Form-based search system (navigates to homepage, fills forms like real user)
Search form selector system for all 10 countries
Trust score system (0-100) with dynamic tracking
Trust-based rate limiting (1000-5000ms delays)
Data quality metrics system (0-100 scoring)
Geo-matching validation with confidence scoring
Auto language detection for emails (18 supported TLDs)
Email quality tracking per domain
Multi-language consent popup handling (15 languages)
Changed
Architecture: Removed direct URL construction, added form-based navigation
Main entry point changed from constructed URL to homepage
Selectors updated with form fields (keywordInput, locationInput, submitButton)
Input schema: scanForEmails moved to first position, default changed to false
maxResults default changed from 1000 to 2
Rate limiting upgraded from static 1000ms to dynamic trust-based delays
Reduced concurrency from 5 to 2 (less aggressive)
Increased retry attempts from 3 to 5
Logging level changed from INFO to WARNING
Fixed
Bot detection prevention with human-like form filling
Search flexibility for location-less queries
Proxy country override issue (removed US fallback)
Session rotation causing 403 blocks
Memory leaks from insufficient garbage collection
Statistics log spam
[3.1.0]
Changed
Added explicit 100ms timeout to all getAttribute() calls for href extraction
Added 50ms timeout to email element getAttribute() calls
Reduced request handler timeout from 300s to 90s
Limited email scans to maximum 5 per page (prevents excessive scanning)
Reduced email extraction timeout from 45s to 20s
Reduced page scan limit from 2 to 1 page per website
Increased concurrency from 2-5 to 3-8 for better parallelization
Optimized page wait times (1000ms to 800ms, 1500ms to 1200ms)
Reduced click delay from 300ms to 200ms
Email/website scanning balanced for performance and discovery
Faster failure detection with reduced timeouts
Better overall throughput with increased parallelization
Fixed
Critical performance bug: getAttribute() calls missing timeout parameters (causing 30s default waits)
99.7% faster listing processing (8 minutes to 1.4 seconds per listing)
Total extraction time reduced from 4 hours to approximately 42 seconds for 30 listings
[3.0.0]
Added
Advanced email extraction engine with multi-layer scanning
Multi-language contact page detection (60+ languages, 40+ countries)
Intelligent email scoring and prioritization system
Domain-to-website matching validation
Email caching layer for performance
Rate limiting and request statistics tracking
Modular architecture: email-extractor.js, validators.js, keywords.js, config.js, cache-utils.js
Deep DOM inspection (attributes, event handlers, scripts, JSON-LD, meta tags)
Shadow DOM and iframe email extraction
CSS pseudo-element content scanning
Comprehensive email deobfuscation (100+ patterns)
Changed
Increased concurrent processing from 20 to 50 businesses
Extended session pool to 200 with higher usage limits
Page-level context isolation for better memory management
Garbage collection triggers every 3 pages
Centralized configuration management
Enhanced error handling with graceful degradation
Fixed
Dataset output stopping at field 09 (now shows all 12 fields)
Missing 12_sourceUrl in output views
Email extraction from obfuscated contact information
Missing emails from JavaScript-heavy websites
Contact page detection across different languages
[2.9.0]
Added
Intelligent location parsing from search queries
Support for up to 1000 results per run
City extraction from addresses with country-specific patterns
Universal location support across all countries
Changed
Reduced navigation timeouts to 60 seconds
Improved parallel data extraction
Enhanced session management
[2.8.0]
Added
Email extraction mode parameter
Contact page scanning when no emails found
Email prioritization sorting
Multiple email detection patterns (9 patterns)
Cloudflare email protection decoding
[2.7.0]
Added
Multi-country support (10 countries total)
Country-specific URL construction
Language headers per country
Country-specific proxy settings
Changed
Selector strategies with primary and fallback options
Website URL validation to filter internal links
[2.0.0] - Previous Releases
Historical releases preserved for reference. Major milestones:
v2.6.0: Advanced email extraction from JavaScript, JSON-LD, microdata
v2.5.0: Website handler for email extraction, statistics tracking
v2.0.0: Crawlee framework integration, PlaywrightCrawler, session pools
v1.5.0: Smooth scrolling, address extraction, phone number links
v1.0.0: Core listing handler, pagination, basic extraction
v0.9.0: Initial Playwright setup, basic single page extraction