Under maintenance

Pricing

Pay per usage

Try for free

Go to Apify Store

Yellow Pages Business Scraper Worldwide

Under maintenance

Try for free

Extract business leads from Yellow Pages directories in over 50 countries. Scrape company names, phone numbers, verified emails, physical addresses, and websites. Perfect for B2B sales prospecting, lead generation, and market research. Fast, reliable data extraction. Export to CSV, JSON via API.

Pricing

Pay per usage

Rating

5.0

(5)

Developer

Țugui Dragoș

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

[3.2.22]

Fixed

Brazilian site (guiamais.com.br) form visibility timeout issue
Enhanced BR site selectors with 14 keyword input options, 12 location input options, and 11 submit button options
Added searchTriggers configuration for BR site to handle hidden/dynamic search forms
Added handleBrazilianSite() function in main.js for special BR form handling
Automatic fallback to direct URL search when BR form is not found

[3.2.0]

Added

Automatic cache cleanup mechanism
validateConfig() function in src/config.js
getRandomDelay() validation
Comprehensive JSDoc type definitions across all modules
Input validation to getKeywordsForLanguages() and getCountryLanguages()
11 missing countries to getCountryLanguages()
9 missing contact page paths
Deduplication using Set in keywords.js
Input validation to parseLocationFromQuery() and getKeywordsForSite()
ReDoS safety documentation in validators.js
Division by zero protection in cache-utils.js
evictOldestEntry() helper function
resetGlobalMonitor() export in memory-monitor.js
v8 heap statistics for accurate percentage calculation
Circular buffer for measurements in memory-monitor.js
Configurable logger option in memory-monitor.js and retry-manager.js
Note about dead code status in retry-manager.js
safeGarbageCollect() helper in email-extractor.js
isContextDestroyedError() helper in email-extractor.js
Click safeguards in email-extractor.js
atomicIncrement() helper in routes.js
extractWithSelectors() helper in routes.js
extractPhone() helper in routes.js
validateUrl() helper in routes.js
FIELDS constant object in routes.js
MAX_LENGTHS constant object in routes.js
maybeGC() helper in routes.js
Screenshot rate limiting in routes.js
takeScreenshotWithLimits() function in main.js
incrementRequestCounter() in main.js
validateMaxResults() function in main.js
resolveSearchFormSelectors() helper in main.js

Changed

Brazil site enumTitle to GuiaMais.com.br
Standardized all configuration properties to camelCase
Renamed PUBLIC_EMAIL_BONUS to PUBLIC_EMAIL_PENALTY for clarity
User agent update documentation added
Consolidated obfuscated email regex patterns
Country parameter support added to keywords.js
Expanded keyword lists for underrepresented languages
Made UK listing selector more specific
Made FR listing selector more specific
Updated import to PUBLIC_EMAIL_PENALTY in validators.js
Sorted TLDs by length for proper matching
Consolidated duplicate name matching logic
Extracted magic numbers to named constants across all modules
Made return values consistent (null vs undefined) in cache-utils.js
lastUpdated timestamp added to metrics
Threshold validation added in memory-monitor.js constructor
Consolidated data requirement checks in memory-monitor.js
HTTP status codes extracted to constants in retry-manager.js
getAllCircuitBreakers() returns deep copy
Replaced isTimedOut flag with AbortController in email-extractor.js
Replaced deprecated waitForTimeout with delay()
Standardized logging levels across modules
Removed deprecated createTreeWalker parameter
Optimized DOM queries with caching
seenBusinesses size limit added in routes.js
Session and crawler parameters added to handleWebsite
Router wrapping pattern fixed in main.js
Replaced console.log with crawlee log
Removed conflicting browser args

Fixed

Critical getAttribute() timeout bug causing 30-second default waits
Critical scanPage() function that was never being called (400+ lines of dead code now activated)
Race conditions with atomic increment helpers
Division by zero errors in memory monitoring
Version to semver format (3.2.0) in .actor/actor.json
File paths for readme/changelog/dockerfile in .actor/actor.json
Brazil enumTitle inconsistency in .actor/input_schema.json
actorSpecification changed to actorDatasetSchemaVersion in .actor/dataset_schema.json
Premature npm ls command removed from Dockerfile
Added --ignore-scripts to npm install in Dockerfile
Removed unused ADJUSTMENT_INTERVAL from config.js
getDomainFromUrl() to return null on error
getKeywordsForLanguages() to populate contactPaths
Missing imports from keywords.js in selectors.js (CRITICAL)
Missing space in ternary operator in selectors.js
Duplicate 'en' removed from locationPrepositions
Duplicate phone selector removed in AU config
Null checks before email.split() operations in validators.js (CRITICAL)
TLD check to use endsWith() instead of includes()
Early return for invalid emails
Public domain check logic
CACHE_CONFIG.maxSize to maxCacheSize in cache-utils.js (CRITICAL)
Removed unused Actor and getDomainFromUrl imports from cache-utils.js
Cleanup for domainQualityMetrics added
Division by zero in detectLeaks() in memory-monitor.js (CRITICAL)
Config warning for singleton pattern added
Slice sizes in getTrend()
Wrong max retries in while loop in retry-manager.js (CRITICAL)
Off-by-one error in retry count
HALF_OPEN state transition bug
Null check for lastFailureTime added
Domain parameter validation added
Function parameter validation added
Jitter calculation for negative delays
Null check for browser in email-extractor.js (CRITICAL)
globalTimeout wrapped in try-finally
ReDoS vulnerability documentation added
Overly aggressive \bat\b replacement
Memory leak by clearing Sets in finally block
Debug logging added to empty catch blocks
Keywords parameter validation added
Try-catch around context creation added
Variable shadowing fixed
Assignment in while condition fixed
Unused imports removed from routes.js (CRITICAL)
RESULTS_COUNT increment added in handleWebsite (CRITICAL)
Error logging added to silent catch blocks
Null check for businessData fields added
Hardcoded timeouts replaced with TIMING_CONFIG
userData validation added
Missing space in conditional in main.js (CRITICAL)
Actor.init() wrapped in try-catch (CRITICAL)
Unused crawleeLog import removed
CONCURRENCY_CONFIG property references fixed
Debug logging added to empty catch blocks
TIMING_CONFIG property names in email-extractor.js (camelCase)
CONCURRENCY_CONFIG property names in cache-utils.js (camelCase)

[3.1.1]

Added

Form-based search system (navigates to homepage, fills forms like real user)
Search form selector system for all 10 countries
Trust score system (0-100) with dynamic tracking
Trust-based rate limiting (1000-5000ms delays)
Data quality metrics system (0-100 scoring)
Geo-matching validation with confidence scoring
Auto language detection for emails (18 supported TLDs)
Email quality tracking per domain
Multi-language consent popup handling (15 languages)

Changed

Architecture: Removed direct URL construction, added form-based navigation
Main entry point changed from constructed URL to homepage
Selectors updated with form fields (keywordInput, locationInput, submitButton)
Input schema: scanForEmails moved to first position, default changed to false
maxResults default changed from 1000 to 2
Rate limiting upgraded from static 1000ms to dynamic trust-based delays
Reduced concurrency from 5 to 2 (less aggressive)
Increased retry attempts from 3 to 5
Logging level changed from INFO to WARNING

Fixed

Bot detection prevention with human-like form filling
Search flexibility for location-less queries
Proxy country override issue (removed US fallback)
Session rotation causing 403 blocks
Memory leaks from insufficient garbage collection
Statistics log spam

[3.1.0]

Changed

Added explicit 100ms timeout to all getAttribute() calls for href extraction
Added 50ms timeout to email element getAttribute() calls
Reduced request handler timeout from 300s to 90s
Limited email scans to maximum 5 per page (prevents excessive scanning)
Reduced email extraction timeout from 45s to 20s
Reduced page scan limit from 2 to 1 page per website
Increased concurrency from 2-5 to 3-8 for better parallelization
Optimized page wait times (1000ms to 800ms, 1500ms to 1200ms)
Reduced click delay from 300ms to 200ms
Email/website scanning balanced for performance and discovery
Faster failure detection with reduced timeouts
Better overall throughput with increased parallelization

Fixed

Critical performance bug: getAttribute() calls missing timeout parameters (causing 30s default waits)

Performance

99.7% faster listing processing (8 minutes to 1.4 seconds per listing)
Total extraction time reduced from 4 hours to approximately 42 seconds for 30 listings

[3.0.0]

Added

Advanced email extraction engine with multi-layer scanning
Multi-language contact page detection (60+ languages, 40+ countries)
Intelligent email scoring and prioritization system
Domain-to-website matching validation
Email caching layer for performance
Rate limiting and request statistics tracking
Modular architecture: email-extractor.js, validators.js, keywords.js, config.js, cache-utils.js
Deep DOM inspection (attributes, event handlers, scripts, JSON-LD, meta tags)
Shadow DOM and iframe email extraction
CSS pseudo-element content scanning
Comprehensive email deobfuscation (100+ patterns)

Changed

Increased concurrent processing from 20 to 50 businesses
Extended session pool to 200 with higher usage limits
Page-level context isolation for better memory management
Garbage collection triggers every 3 pages
Centralized configuration management
Enhanced error handling with graceful degradation

Fixed

Dataset output stopping at field 09 (now shows all 12 fields)
Missing 12_sourceUrl in output views
Email extraction from obfuscated contact information
Missing emails from JavaScript-heavy websites
Contact page detection across different languages

[2.9.0]

Added

Intelligent location parsing from search queries
Support for up to 1000 results per run
City extraction from addresses with country-specific patterns
Universal location support across all countries

Changed

Reduced navigation timeouts to 60 seconds
Improved parallel data extraction
Enhanced session management

[2.8.0]

Added

Email extraction mode parameter
Contact page scanning when no emails found
Email prioritization sorting
Multiple email detection patterns (9 patterns)
Cloudflare email protection decoding

[2.7.0]

Added

Multi-country support (10 countries total)
Country-specific URL construction
Language headers per country
Country-specific proxy settings

Changed

Selector strategies with primary and fallback options
Website URL validation to filter internal links

[2.0.0] - Previous Releases

Historical releases preserved for reference. Major milestones:

v2.6.0: Advanced email extraction from JavaScript, JSON-LD, microdata
v2.5.0: Website handler for email extraction, statistics tracking
v2.0.0: Crawlee framework integration, PlaywrightCrawler, session pools
v1.5.0: Smooth scrolling, address extraction, phone number links
v1.0.0: Core listing handler, pagination, basic extraction
v0.9.0: Initial Playwright setup, basic single page extraction

Yellow Pages Email Scraper

scraper-mind/yellow-pages-email-scraper

Extract business emails from Yellow Pages effortlessly! 🚀 The Yellow Pages Email Scraper helps you find targeted leads using keywords, location filters & custom domains. Perfect for B2B outreach & marketing.

Scraper Mind

137

Smart Business Lead Collector – AI Contact & Company Scraper

vanagha/smart-business-lead-collector---ai-contact-company-scraper

Collect verified business emails, phones, and company summaries with AI. This smart scraper uses LlamaIndex to find and deduplicate contact info from any website. Fast, tested, and free for a limited time!

Van agha

Yellow Pages Italy (Pagine Gialle) Business Lead Generator

lead.gen.labs/yellow-pages-italy-pagine-gialle-business-lead-generator

Yellow Pages Italy (Pagine Gialle) Business Lead Generator is an advanced Apify Actor that extracts business information from Pagine Gialle (Italy's Yellow Pages). Gather company names, contact details, websites, and more to streamline your lead generation, sales prospecting, and market research.

LeadGen Labs

Pagine Gialle

nmdmnd/pagine-gialle

Scrape Italian Business Directory Pagine Gialle (Yellow Pages)

Nomad Monad

PagineGialle Extractor

data2b/paginegialle-extrator

Extract precise Italian business data from paginegialle.it. This Actor uses an optimized API to fetch Multisearch JSON, capturing names, addresses, phones, websites, emails, social links, ratings, & categories. With pagination, rate limiting, & unique ID deduplication, it ensures reliable data.

DATA2B

5.0

Pagine Gialle Scraper (Pay Per Result)

emastra/pagine-gialle-scraper

Unlock rich, structured business data from PagineGialle.it — the Italian Yellow Pages and Italy’s top business directory. Scrape by category, city, postcode or URL. Ideal for lead generation, local SEO, B2B sales, and market research. No coding needed, fully automated, result-limit bypass included.

Emiliano Mastragostino

4.0

Gelbe Seiten Scraper - German Business Leads & Company Data

plowdata/gelbe-seiten

Extract German business leads and company information from Gelbe Seiten (gelbeseiten.de). Collect emails, phone numbers, addresses, reviews, and rich listing data. Export to CSV, Excel, JSON, or integrate into automation workflows.

Frederic

304

5.0

Yellow Pages Business Contact Scraper

bhansalisoft/yellow-pages-business-contact-scraper

Yellow Pages Business Contact Scraper is a powerful Apify actor that helps you extract verified business listings from multiple Yellow Pages directories worldwide. It is ideal for lead generation, sales outreach, market research, and business intelligence.

bhansalisoft

Gelbe Seiten Scraper – German Business Leads (Pay per Result)

plowdata/gelbe-seiten-ppr

Extract German business leads and company data from Gelbe Seiten (gelbeseiten.de) with pay-per-result pricing. You only pay for successfully extracted, deduplicated listings (one business = one result). Includes emails, phone numbers, addresses, reviews, and rich profile data.

Frederic

Gelbe Seiten (German Yellow Pages) Scraper

dominic-quaiser/gelbe-seiten-german-yellow-pages-scraper

Scrape German business listings from Gelbe Seiten with flexible detail levels. This Apify Actor supports fast, basic, and deep search modes, rate limiting, proxy rotation, and index control. Ideal for lead gen, SEO, and market research. Outputs structured data to Apify datasets.

Dominic M. Quaiser

110

5.0

Yellow Pages Business Scraper Worldwide

[3.2.22]

Fixed

[3.2.0]

Added

Changed

Fixed

[3.1.1]

Added

Changed

Fixed

[3.1.0]

Changed

Fixed

Performance

[3.0.0]

Added

Changed

Fixed

[2.9.0]

Added

Changed

[2.8.0]

Added

[2.7.0]

Added

Changed

[2.0.0] - Previous Releases

You might also like

Yellow Pages Email Scraper

Smart Business Lead Collector – AI Contact & Company Scraper

Yellow Pages Italy (Pagine Gialle) Business Lead Generator

Pagine Gialle

PagineGialle Extractor

Pagine Gialle Scraper (Pay Per Result)

Gelbe Seiten Scraper - German Business Leads & Company Data

Yellow Pages Business Contact Scraper

Gelbe Seiten Scraper – German Business Leads (Pay per Result)

Gelbe Seiten (German Yellow Pages) Scraper

Related articles

[3.2.22]

Fixed

[3.2.0]

Added

Changed

Fixed

[3.1.1]

Added

Changed

Fixed

[3.1.0]

Changed

Fixed

Performance

[3.0.0]

Added

Changed

Fixed

[2.9.0]

Added

Changed

[2.8.0]

Added

[2.7.0]

Added

Changed

[2.0.0] - Previous Releases