[9.1.0]
Added
Complete SerpAPI integration for both Google and Bing, including secure API key handling and automatic fallback to Playwright when the API fails
Dedicated Google/Bing SERP provider selectors in the input schema so users can force SerpAPI or direct browser scraping per engine
Boolean toggle useApifySerpActor replacing confusing serpMode (still supported for backward compatibility)
serpFallbackToBrowser boolean for clearer fallback control
Proxy configuration with native Apify proxy editor
Extraction depth selector (basic/moderate/deep) for performance vs thoroughness control
Email validation levels (none/basic/strict) for quality control
Output format selector (JSON/CSV/Excel) with automatic export
Webhook support with URL and method configuration
Section captions and descriptions for better UI organization
CSV Export - Automatic CSV generation with proper escaping and formatting
Excel Export - Placeholder for Excel export (falls back to CSV)
Webhook Notifications - Send results to external endpoints on completion
Debug Mode - Enhanced logging for troubleshooting with LogLevel.DEBUG
Social Media Toggle - Option to disable social media extraction for faster processing
Business Hours Toggle - Option to disable business hours extraction
Null checks for session.retire() calls in routes.js (lines 305, 310, 501, 570)
Changed
Complete input_schema.json redesign - User-friendly interface with clear sections and descriptions
Dynamic concurrency - Automatically adjusts based on extraction depth (3-8 concurrent requests)
Reduced delays by 60-80% for improved performance
Parallel contact page scanning - Promise.all implementation for simultaneous contact page processing
Optimized retry delays - Exponential backoff reduced from 1.8^n to 1.5^n for faster recovery
Updated README.md documentation to reflect correct searchEngine default ("bing")
Updated README.md performance benchmarks version references from 8.1 to 9.1
Re-enabled important ESLint rules (no-unused-vars, no-empty) with appropriate configurations
Removed the Bing Web Search API dependency and duplicate advanced settings toggles
Fixed
Fixed critical Dockerfile issue: removed reference to missing check-playwright-version.mjs file
Fixed Dockerfile CMD syntax to use proper JSON array format
Fixed default searchEngine mismatch in main.js (changed from 'google' to 'bing' to match input_schema.json)
Fixed output_schema.json template field references (changed from url,email to 06_sourceUrl,02_emails)
Fixed version inconsistency in actor.json (standardized to "9.1.0")
Fixed dataset_schema.json - Added complete views, display properties, and format configurations for proper Apify interface display
Fixed missing descriptions that prevented proper output display in UI
Security
Input validation - Comprehensive validation with length limits, pattern checking, and type validation
API key protection - All API keys are now sanitized in logs and error messages (shows only last 4 chars)
XSS prevention - Input sanitization for searchQuery to prevent script injection
Error message sanitization - API keys removed from all error messages and stack traces
Top-level error handling - Try-catch wrapper for Actor.init with graceful failure
Secure webhook implementation - URL validation before sending data
[9.0.0]
Added
Multi-Strategy Adaptive System with intelligent 3-layer fallback for maximum data extraction
Strategy 1: Deep contact page scanning with intelligent zone targeting
Strategy 2: Aggressive whole-page regex with obfuscated email detection
Strategy 3: Metadata and script mining for hidden contact information
Intelligent Zone Targeting with contact-rich area prioritization
Quality Scoring System with real-time quality assessment for each extraction
Enhanced Multi-Language Support covering 20+ languages
Fallback Selector System for Google and Bing search extraction
VERSION-9.0-FEATURES.md comprehensive feature documentation
Changed
Smart Company Name Extraction with multi-source fallback (h1, og:site_name, og:title, meta title, document.title)
Enhanced email extraction with aggressive regex patterns for obfuscated emails
Improved phone extraction with meta tag and script tag content scanning
Deep scan enhancements with contact page limit increased to 3 pages
Console output enhancement with visual quality indicators and color-coded ratings
[8.2.0]
Added
Intelligent Data Extraction with browser-side intelligent zone scanning
Contact Zone Targeting for footer, header, contact sections, about sections, navigation
Smart Contact Page Detection with text-based pattern matching
Changed
Email extraction improvements with browser-side extraction and contact zone priority
Phone extraction improvements with browser-side phone extraction from contact zones
[8.1.0]
Added
Advanced Anti-Detection with Stealth Plugin Integration (playwright-extra with puppeteer-extra-plugin-stealth)
Complete navigator.webdriver removal and obfuscation
Enhanced Human Behavior Simulation with multi-step mouse movements
Changed
Session Pool expanded from 20 to 100 concurrent sessions
Extended session usage from 10 to 20 requests per session
Improved Retry Logic with increased max retries from 5 to 10 attempts
Reduced concurrency from 2 to 1 for maximum stealth
Extended timeouts (Navigation: 180s, Request handler: 360s)
Extended browser launch arguments for better anti-detection
Removed country code restriction for global proxy support
[8.0.0]
Added
Universal Language Support - Completely redesigned contact page detection to work in ANY language
Universal URL pattern matching (contact, kontakt, 联系, お問い合わせ, 문의, связаться, etc.)
Unlimited Data Extraction - Removed all limits on email, phone, and social media extraction
Extended Social Media Support (YouTube, TikTok, Pinterest, WhatsApp, Telegram)
Session Management with 20 concurrent sessions and automatic rotation
CAPTCHA detection on both Google and Bing with auto-retry
Browser Anti-Detection with fingerprint masking
Smart Rate Limiting with random human-like delays
Enhanced phone number validation (10-15 digits)
Multi-format address detection (US, Canada, Europe)
Blog post extraction (up to 5 per website)
README.md complete English documentation
CHANGELOG.md this file
Multiple test input files (german, spanish, stress-test)
Changed
Contact pages visited increased from 2 to 5 per website
Max crawl limit increased from maxResults * 2 to maxResults * 3
Concurrency reduced from 3 to 2 for better stability
Navigation timeout increased from 60s to 90s
Handler timeout increased from 120s to 180s
Session pool size configured to 20
Session max usage set to 10 requests
Reorganized contact page detection logic
Improved email filtering algorithms
Enhanced social media extraction patterns
Fixed
Advanced email pattern filtering to exclude image filenames (@2x.png, button-1@.svg )
Improved obfuscated email detection
Enhanced Facebook detection (filters share buttons and plugins)
Enhanced Twitter/X detection (supports both domains)
[7.7.0]
Added
Basic Google and Bing search support
Email and phone extraction
Facebook, LinkedIn, Instagram, Twitter detection
Contact page visiting (up to 2 pages)
Basic anti-bot measures
Migration Guide: v7.x to v8.0
Breaking Changes
Before (v7.x):
After (v8.0):
Before (v7.x):
"socialMedia" : {
"facebook" : [ "url1" , "url2" ] ,
"linkedin" : [ "url1" , "url2" ] ,
"instagram" : [ "url1" , "url2" ] ,
"twitter" : [ "url1" , "url2" ]
}
After (v8.0):
"04_socialMedia" : {
"facebook" : [ "url1" , "url2" , ... ] ,
"linkedin" : [ "url1" , "url2" , ... ] ,
"instagram" : [ "url1" , "url2" , ... ] ,
"twitter" : [ "url1" , "url2" , ... ] ,
"youtube" : [ "url1" , ... ] ,
"tiktok" : [ "url1" , ... ] ,
"pinterest" : [ "url1" , ... ] ,
"whatsapp" : [ "url1" , ... ] ,
"telegram" : [ "url1" , ... ]
}
3. Field Names Prefixed
All output fields now have number prefixes for better ordering:
companyName → 01_companyName
emails → 02_emails
phoneNumbers → 03_phoneNumbers
socialMedia → 04_socialMedia
physicalAddress → 05_physicalAddress
sourceUrl → 06_sourceUrl
businessHours → 07_businessHours
additionalInfo → 08_additionalInfo
New Capabilities
Universal Language Support
No configuration needed - works automatically in ANY language:
{
"searchQuery" : "レストラン 東京" ,
"searchEngine" : "google" ,
"maxResults" : 10
}
Now extracts ALL emails, phones, and social links:
Previous: Max 5 emails
Now: ALL emails found (tested up to 31 on single site)
Enhanced Social Presence
Now detects 9 platforms instead of 4:
Previous: FB, LI, IG, TW (max 2 each)
Now: FB, LI, IG, TW, YT, TT, PI, WA, TG (unlimited)
Better Anti-CAPTCHA
Previous: Basic proxy support
Now: 20 session pool, auto CAPTCHA detection, 5 retries