Premium lead finder for web agencies and SDR teams. Scrape YellowPages and auto-enrich every lead with a 0-100 lead score, website tech stack (Wix/WordPress/Shopify), real emails, phone E.164, mobile + SEO audit, and an outreach pitch. CSV export for HubSpot/Pipedrive. 30+ fields per lead.
All notable changes to this Actor will be documented here.
[2.2] — 2026-05-22
Added — 5 more lead-quality features
☁️ CloudFlare email decoder — Many WordPress / CloudFlare-protected sites obfuscate emails inline as data-cfemail="abc123...". The actor now decodes them with the standard XOR algorithm, recovering 20-30% more real emails on those sites.
📱 Mobile-friendliness audit — Detects 5 signals from website HTML head:
has-viewport-meta (responsive design)
has-responsive-css (@media queries)
has-mobile-alternate (m. subdomain)
has-amp (AMP version)
fixed-width-layout (NEGATIVE signal)
Returns mobileFriendly boolean + mobileSignals[] per lead. Sites without mobile viewport get +8 to leadScore (clear pitch angle for "responsive redesign").
🔍 SEO hygiene audit — Checks 5 on-page basics:
hasMetaDescription
hasOgImage (Open Graph)
hasH1
hasJsonLd (structured data)
hasCanonical
Returns seoAudit: {hasX, ..., seoScore} (0-100). Sites with seoScore < 40 get +5 to leadScore (pitch SEO audit).
🎯 Industry-specific outreach pitches — 8 industries with custom angles:
Plumbers / Electricians / HVAC: "emergency calls — half your jobs come from someone Googling at 2am"
Restaurants / Pizza: "online menus and reservation links — diners decide where to eat from their phone"
Dentists: "online booking — patients now expect to schedule a cleaning the same way they book an Uber"
Lawyers / Attorneys: "Google rankings for '[city] [practice area] attorney' — that's where 80% of clients start"
Auto Repair: "Google reviews + 'near me' — most car owners pick the nearest 4★ shop"
Salons / Hair / Barber: "Instagram-style portfolio gallery — clients book based on photos"
Gyms / Fitness: "membership signup forms + class schedules — gym shoppers compare 3-4 sites"
Landscaping / Roofing: "before/after photo galleries — homeowners hire based on portfolio"
Real Estate / Realtor: "IDX listings + lead capture — agents without sites lose 40% of online enquiries"
Cleaning Services: "online quote forms + booking — most customers want a price in 60 seconds"
The industry angle is woven into the existing dynamic pitch (no-website / dead-site / Wix / SEO templates).
📍 Geocoding via OpenStreetMap Nominatim (opt-in)
New enrichGeocode parameter (default false). When enabled, every lead gets:
osmUrl — deep link to the location on OpenStreetMap
Useful for territory-routing in CRMs (Pipedrive territories, HubSpot deal regions) or plotting leads on a custom map dashboard. Off by default because Nominatim asks for ≤1 req/sec, so this slows runs by ~1s per lead.
Improved — email scraping accuracy
Cleaned up false positives that were leaking through earlier versions:
TLD whitelist now rejects JS-fragment artefacts (window.location.reload no longer matches loc@ion.reload)
Lookbehind (?<![A-Za-z0-9.]) rejects emails extracted from URLs (www.flavorplate.com no longer matches flavorpl@e.com)
Stricter EMAIL_OBFUSC_RE requires visible separators (brackets / parens / whitespace) so brand names like "flavorPLATE" don't match as "flavor[PL][AT][E].com"
Domain blacklist expanded with parastorage.com, wixstatic.com, cloudfront.net, gravatar.com, wp.com, automattic.com, etc.
Local blacklist adds placeholder addresses (user, youremail, yourname, etc.)
Pluggable _is_plausible_email() helper applied to every extraction path (mailto, CF-decoded, plain, obfuscated)
Added — CSV export columns
Latitude, Longitude, Geocoded Address, OSM Map, Mobile Friendly, SEO Score
[2.1] — 2026-05-22
Added — 5 more enrichment features
📧 Real email scraping from website
For every lead with an alive website, the actor now scrapes plain-text emails and mailto: links from the homepage HTML. Filters out 30+ tracker / CDN domains (Wix, Google, FB, Sentry), Retina image hashes (image@2x.png), and noreply addresses. Returns up to 5 unique emails per lead in emailsFromWebsite[].
📞 Real phone scraping + E.164 normalisation
New phoneE164 field on every lead — listing phone normalised to +1XXXXXXXXXX
New phoneTel field — tel:+1XXXXXXXXXX click-to-call URL ready for HTML or buttons
New phonesFromWebsite[] field — additional numbers found in tel: links and page text on the website (different department, mobile, after-hours, etc.)
🏷️ Chain / franchise detection
~50 national chain brands matched by name (Roto-Rooter, Subway, Domino's, Great Clips, Anytime Fitness, RE/MAX, Servpro, Verizon, AT&T, etc.). Two new fields:
isChain (boolean)
chainBrand (string — e.g. "roto-rooter")
Chains get a -30 leadScore penalty (corporate marketing controls spend, hard to sell to). New input excludeChains: true drops them entirely.
🎯 Best-contact-channel recommendation
New bestContact field returns {channel, value, label} — picks the single highest-confidence outreach path so users don't have to scan 5 fields. Priority order:
Real email scraped from website
Email from YellowPages listing
Phone (E.164) from listing
Phone scraped from website
Website contact page URL
Email guess (verify before use)
Website homepage / Listing URL
📅 Brand age via Wayback Machine
Single tiny request to archive.org/wayback/available per lead returns the year of the first archived snapshot. New field brandAgeYears. Lead score bonus:
brandAge >= 5 AND websiteAlive=false → +10 (established brand with dead site = prime replacement target)
brandAge >= 5 (alive) → +3
🔗 Contact-page discovery
New contactPageUrl field — auto-found /contact / /reach-us / /get-in-touch link on the website. Falls into bestContact priority when no email is found.
Changed
Lead score now factors in chain status (-30) and brand age (+10/+3)
CSV export adds 9 new columns: Phone (E.164), Phone Click, Best Contact, Best Contact Channel, Best Contact Label, Is Chain, Chain Brand, Brand Age (years), Contact Page, Emails from Website, Phones from Website
Aggregate summary adds chainCount and withRealEmailScraped fields
Email guesses now skip known directory aggregator domains so we don't generate info@yellowpages.com when only a listing URL is available
New input parameters
excludeChains (default false) — drop chain franchises
emailGuesses[] — info@, contact@, hello@, office@ from website domain
socialSearchUrls{} — 1-click search links for Facebook, Instagram, LinkedIn, Google Maps, Google Search
outreachPitch — auto-written 2-sentence cold opener, tailored to no-website / dead-site / Wix / Squarespace / generic scenarios. Uses business name, city, rating, review count when impressive
Filtering & sorting:
minLeadScore input — drop cold leads at the source
maxResults input — hard cap after sorting (cost control)
Output formats:
exportFormat: "default" (full JSON, includes everything)
exportFormat: "csv" — flat record with HubSpot / Pipedrive column names: Company, Industry, Lead Score, Lead Tier, Outreach Pitch, etc.
exportFormat: "both" — full JSON plus a nested _csv field
Aggregate summary:
Every run ends with one _summary: true record containing
totalLeads, withoutWebsite, withDeadWebsite, avgLeadScore,
leadTierBreakdown, topTechStacks, category, location
Cost: unchanged. All enrichment is included in the $4/1K Pro price.
Changed
categories updated from ["LEAD_GENERATION", "JOBS"] to
["LEAD_GENERATION", "BUSINESS", "MARKETING"] (more relevant)
Title now leads with the new value props: "Lead Score, Tech Stack, Outreach Pitch"
Default lead is now the highest-scoring one — the actor sorts by leadScore descending
Migration
v1.x users keep working unchanged — the Business Name, Phone, Address,
Rating, Reviews Count, Category, Website, Email, Hours,
Years in Business, Listing URL, hasWebsite fields all remain.
The new fields are additive. Disable enrichment via:
{
"enrichWebsites":false,
"enrichEmailGuesses":false,
"enrichSocialUrls":false,
"includeOutreachPitch":false
}
[1.0] — 2026-05-11
Added
Initial release
YellowPages scraper via Thunderbit by category + location
hasWebsite flag for "businesses without website" filtering