All notable changes to this Actor will be documented here.
[1.4] — 2026-05-23 — Slug-fallback for upstream failures
Added — Recovery from upstream throttling
Yelp pages are heavily protected and the live source occasionally throttles
requests. The actor now implements a graceful fallback so a run still returns
useful data even when the live page can't be fetched.
Partial records (name only, no website) emit a ⚠ warning level instead
of being silently filed as failures.
Compatibility
All v1.3 fields preserved.
New dataSource field marks the origin (primary or slug_fallback).
New partial boolean for the rare case where slug is parseable but no
website was discovered.
slugFallbackOnFail can be set to false to restore strict v1.3 behavior.
[1.3] — 2026-05-23 — Sales-ops automation kit
Added — 9 new feature layers
This release focuses on end-to-end sales workflow support: better data
recovery, structured CRM-friendly output, and one-click outreach automation.
Automatic retry on transient failures (maxRetries, default 2):
The live source occasionally returns transient errors. The actor now retries
with linear backoff (8s, 16s) before giving up. Empirically lifts success
rate from ~50% to ~75%+ on diverse listings.
JSON-LD schema.org parsing on the discovered website:
schema_same_as[] — real, brand-published social URLs (often more
accurate than what we extract from <a> tags). Auto-merged into
social_profiles{} so downstream consumers don't need to dedupe.
schema_telephones[], schema_emails[] — phones/emails the brand
exposes via Schema.org markup. Merged into phoneE164 if not already set.
schema_address{street, city, state, zipCode, country} — structured
address from PostalAddress block.
schema_latitude / schema_longitude — auto-promoted to latitude /
longitude (free geocoding, no Nominatim call needed).
When the website has no extractable email, generate likely contact
addresses for the discovered domain: info@, hello@, contact@,
support@, sales@, team@, office@, admin@, inquiries@,
reservations@, bookings@. Returned as emails_guessed[] with
emails_guessed_warning flag — speculative, verify before sending.
Lifts cold-outreach hit rate ~3-5x on small businesses without
obvious public emails.
Address parser (extractAddressParts, default on):
parsedAddress: {street, city, state, zipCode, country, formattedAddress}
— handles all observed Yelp formats (with/without commas, newlines).
timezone — derived from US state code → IANA timezone (PST/EST/CST/
MST/AKST/HST). Replaces the previous UTC-based is_open_now with a
timezone-aware variant that returns correct results for businesses
in Pacific/Mountain/Central time.
Domain age via crt.sh (extractDomainAge, default off — slow):
earliest_ssl_date, website_domain_age_years — earliest certificate
issuance for the discovered website's domain. Strong legitimacy signal
for B2B prospecting.
mailto_url + mailto_url_with_pitch (subject + outreach pitch as body)
tel_url, sms_url, whatsapp_url (with auto-pasted pitch up to 240 chars)
linkedin_search_url — pre-filtered LinkedIn people search for
<business> owner OR founder OR CEO
google_search_url — "<business name>" <city>
yelp_competitors_url — Yelp search for the same primary category in
the same neighborhood (territory/competitor research)
TOP_LEADS sorted KV record (free, written automatically on bulk runs):
Top 20 prospects sorted by leadScore, with the most-actionable fields
flattened (businessName, city/state, leadScore, phoneE164, primaryEmail,
bestContact, mailtoUrl, whatsappUrl, linkedinSearchUrl).
Read it via:
client.key_value_store(run["defaultKeyValueStoreId"]).get_record("TOP_LEADS")
CSV export gained 9 new columns: street, city, state, zipCode,
timezone, website_domain_age_years, emails_guessed_first,
mailto_url, whatsapp_url, linkedin_search_url,
yelp_competitors_url.
Phones from Schema.org are normalized to E.164 if no other phone is set.
Address parser handles "73 Geary St San Francisco, CA 94108" (no comma
between street and city).
Compatibility
All v1.2 fields preserved with the same names.
New layers are independently toggleable via input flags.
Existing integrations using only the old fields continue to work unchanged.
Pricing unchanged: $3/1K. The retry layer adds latency but no per-run
cost; the JSON-LD/email-guess/address-parse layers reuse already-fetched
bytes.
[1.2] — 2026-05-23 — Lead-gen powerhouse
Added — 8 new feature layers (no cost increase)
The actor now reuses the single website-fetch from v1.1 to extract a much
richer feature set. Cost stays the same; per-business value goes up
significantly.
outreachPitch — ready-to-paste cold-outreach message tailored to the
Yelp category. 15 industry templates: restaurants, salons/spas,
dentists/medical, auto, home services, law, real estate, fitness, hotels,
retail, pets, events, cleaning, education, finance. Generic fallback for
uncategorized businesses.
Geocoding (extractGeocoding, default OFF — opt-in):
latitude, longitude via Nominatim (OpenStreetMap, free, rate-limited).
Off by default — Nominatim's TOS allows ~1 req/sec.
Aggregate summary (writeSummary, default on):
On bulk runs (>1 business), a SUMMARY record is written to the run's
key-value store (free — doesn't trigger pay-per-event). Contains:
avg_rating, median_reviews, avg_popularity_score, avg_lead_score,
by_customer_segment, by_quality_tier, by_lead_tier,
with_website_alive_pct, with_emails_pct, with_social_profiles_pct,
top_categories[], top_tech_detected[], chains_detected[],
open_now_count. Read it with
client.key_value_store(run["defaultKeyValueStoreId"]).get_record("SUMMARY").
CSV export (exportFormat: "csv"):
Flattens the rich JSON into a sales-ready 30-column CSV with one row
per business. Designed for direct import into Google Sheets / HubSpot /
Pipedrive / Salesforce. Sample columns: businessName, leadScore,
leadTier, customer_segment, phoneE164, emails_from_website_first,
instagram, facebook, tech_stack_summary, outreachPitch.
Chain filtering (excludeChains, default off):
When set, skips businesses with chain_likelihood_score ≥ 50 (Starbucks,
McDonald's, etc.). Useful for SMB lead-gen.
Identifier:
yelpBusinessId — the slug from yelp.com/biz/{slug} URL, useful for
deduplication.
Single website fetch shared by all enrichment — tech stack, emails,
phones, socials, SEO, action links all extracted from one HTTP GET (with
body truncated to 600KB). No additional bandwidth/cost vs v1.1.
online_presence_score rebalanced to credit social profiles, recovered
emails, and SEO hygiene.
chain_likelihood_score keyword list extended (Wing Stop, Raising Cane's,
Popeye's, Jersey Mike's, Jimmy John's, Qdoba, Jamba Juice, etc.).
Emails are filtered against an extended CDN blacklist (Wix, Shopify,
Cloudflare, GTM, etc.) and a TLD whitelist (~120 valid TLDs) to drop
noise like loc@ion.reload from JS fragments.
Compatibility
All v1.1 fields preserved with the same names.
New layers can be toggled off independently via input flags.
Existing integrations using only the old fields continue to work unchanged.
Pricing unchanged: $3/1K. The single website fetch is the same as v1.1
— we just read more out of the same bytes.
[1.1] — 2026-05-22
Removed
Direct Yelp HTML / JSON-LD layer (extractSchema) — after testing this
layer returned no usable data. Removing it keeps the actor fast and reliable.
Geo coordinates, payment_accepted, serves_cuisine, schema_*, photos_count,
is_claimed, is_verified, schema_same_as fields are no longer attempted.
Improved
Hours intelligence parser now handles both colon-delimited and
whitespace-delimited hours formats:
Mon-Sun: 7:30 AM - 6:00 PM
Mon 7:30 AM - 6:00 PM
Mon-Fri: 9 AM - 5 PM, Sat: 10 AM - 4 PM, Sun: Closed
Monday: 7:00 AM - 9:00 PM\nTuesday: ...
"Closed" / "Off" days correctly skipped
Website detection now skips Yelp's own yelp.com/biz/... URLs (the source
occasionally returns the listing URL itself instead of the external website)
and tries to recover an external URL from amenities/address fields
chain_likelihood_score punctuation-tolerant — detects "In 'N Out Burger",
"In-N-Out", "In N Out" as the same chain
online_presence_score rebalanced — gives more weight to website + phone +
emails, removed score for fields that depended on the dropped Schema layer
service_offerings_count now correctly counts amenities split by ; as well as ,
WEBSITE and AGE layers gracefully degrade when their upstream sources
are unavailable.
[1.0] — 2026-05-22
Added — Major intelligence expansion
The actor was rebranded from Yelp Business Scraper to Yelp Business Analyzer
to reflect the depth of analysis. All previous fields remain available,
untouched, on the same extractCore flag. New layers can be toggled off
independently.
New data sources:
🕐 Hours intelligence — full structured weekly schedule parsed from
the free-text hours field