All notable changes to this Apify Actor. Built with transparent development — every version and its tradeoffs documented.
[1.3.5] — 2026-04-29
Added — Store SEO + comparison surface
README FAQ section: 9 evergreen long-tail questions answered (differentiation vs other Google Maps scrapers, pricing model, email hit rate per market, multilingual coverage, preflight check, CRM/AI usage, recurring scrape patterns, what the actor does NOT do)
README competitive comparison table: feature-by-feature matrix vs Compass GMS, Lukas Krivka, and others — explicitly identifies our differentiators (MX/SPF/DMARC validation inline, multilingual contact-page crawl, lead scoring 0-100, delta mode $0 dupes, preflight budget check, AI-ready outreach profile, Meta Ad Library URL)
README "Use the right tool for the job" matrix — honest steering: Compass for million-record raw scrapes, this actor for validated EU multilingual outreach pipelines
Changed — Store metadata SEO hygiene
seoTitle: "Google Maps Email Extractor with Built-in Email Validation" (58 chars, fits Google 60-char display)
seoDescription: keyword-dense 143-char description with MX/SPF/DMARC + lead scoring + multilingual crawl + $0.005/lead + $0 duplicates
description (Store card): 277 chars covering the full differentiator stack with localized contact-page keywords (kapcsolat, kontakt, contacto, contatti)
Note — PPE pricing went live
PAY_PER_EVENT pricing activated 2026-04-28T11:14 UTC — flat $0.005 per delivered lead + $0.00005 per run start. Failed/timed-out runs now cost $0. No code change in this release, just confirmation.
[1.3.4] — 2026-04-23
Added — Store listing visuals (README inline images)
Three screenshots embedded in the README, hosted on the Apify CDN via a public key-value store (lvBwYNZ1MRj8eWdsg):
Preflight log — actor run showing [estimate] + [preflight] block before any events are billed
Record detail — JSON record view of a single high-score lead with emailValidation, webSignals, suggestedOpener, metaAdLibraryUrl, cid
Live Austin demo dataset referenced in README and marketing docs: wMnqRj2ChH4NbsuVk (168 records, 53% email hit, 25% high-deliverability, 83% hot leads, $0.84 cost, 22 min runtime)
Store pictureUrl set to branded actor icon (GM monogram + VALIDATED checkmark, 512×512) — uploaded via programmatic Console automation; served from Apify images CDN
Fixed — Store metadata hygiene
Cleared UNDER_MAINTENANCE notice flag (was blocking the Store banner)
Replaced placeholder exampleRunInput ({helloWorld: 123}) with a real dentist Austin Texas 9-field config
Categories set to LEAD_GENERATION + ECOMMERCE + MARKETING (3 is the API hard cap)
[1.3.2] — 2026-04-23
Changed — pre-launch hardening (before PAY_PER_EVENT goes live 2026-04-28)
Preflight runtime-estimator now tile-aware
Before this build, the preflight estimate ignored geoGridTiles multiplier — a user setting geoGridTiles=5, maxResults=100 saw "~3 min" when the actual runtime was ~56 min. On the default 2h timeout this was harmless, but on custom short timeouts it could false-green an impossible run.
New formulas (mirroring the v1.3.1 verified Budapest test case within 1% accuracy):
tileUniqueFactor = N >1 ? N² × 0.75:1 // 75% unique survival after cross-tile dedup
The 75% survival rate is calibrated from the v1.3.1 end-to-end test: 3×3 Budapest tiles discovered 90 raw hits, cross-tile dedup filtered 23 (25.5%), delivered 67 unique. New formula predicts 68 for that input — off by one, well within estimator tolerance.
User-visible effects:
Tile-enabled runs log a dedicated line: [estimate] geoGridTiles=5 → 1×25=25 tile searches, ~1875 expected unique lead(s), runtime ≈ 56 min (+5 min tile-scroll overhead).
Preflight refusal error message now enumerates geoGridTiles as a reduce-knob option [T] alongside existing [A]..[D] (lower maxResults / raise timeout / split runs / disable enrichments).
maxResultsSafe calculation in the error message accounts for tile fan-out — it now recommends a value that actually fits.
Delta mode: CID as secondary dedup key
Delta mode (sinceDatasetId) previously dedup'd purely on placeId (format 0xHEX:0xHEX). This worked, but had a failure mode we wanted to close before scaling: if Google Maps changes the placeId token format between runs (observed once on the v1.1.x → v1.2.x transition — capitalisation drift), the stable identifier is actually cid.
New behaviour: Delta mode loads BOTH placeId and cid from the prior dataset and skips a place if EITHER matches. CID is extracted at enqueue time from the place URL's !1s0xHEX:0xHEX token via extractCid() + cidFromPlaceId() (same helpers introduced in 1.3.1), so the check happens before any detail-page fetch. Zero extra cost.
Log line updated: [delta] Loaded N placeId(s) + M CID(s) from prior run — will skip duplicates by either key.
autoExtend × tiles — log-message clarity
The auto-extend path now explicitly calls out geoGridTiles in the warning so users can see why the estimated runtime is what it is:
⚠ Your run's timeout(60 min) is too short for maxResults=100+ geoGridTiles=5(25 tile searches)(~56 min needed).
autoExtend=true → starting a fresh run with7200s(2h) timeout so you don't have to configure anything.
No functional change — the auto-extend logic already spread the entire input (including geoGridTiles) into the spawned run and used the tile-aware estRuntimeSecs for the buffer calculation. Code review confirmed correctness; this change is purely for observability.
Fixed
src/main.js: preflight formula now includes tileUniqueFactor and tilePhaseOverheadSecs when geoGridTiles > 1.
src/main.js: maxResultsSafe divisor corrected for tile fan-out.
src/main.js: delta-mode bootstrap loads cid from prior dataset alongside placeId.
src/routes.js: new setSkipCids() export; SEARCH handler dedup loop checks both placeId and CID.
Not changed
PAY_PER_EVENT schema — still apify-default-dataset-item: $0.005 + apify-actor-start: $0.00005.
Tile logic, CID extraction helpers — stable from 1.3.1.
User-facing API — sinceDatasetId usage is unchanged; the CID check is purely additive.
Validation
Preflight math unit-tested across 6 scenarios (baseline, Budapest 3×3 verified, Manhattan 5×5, NYC 10×10, multi-query, unlimited).
3×3 Budapest prediction: 68 leads → actual in 1.3.1 test: 67 leads. 0.7% prediction accuracy.
🎯 Geo-grid tiling — bypass Google's 120-result cap
Google Maps returns at most ~120 places per search, regardless of how many actually match. For small cities this is fine; for Manhattan (800+ restaurants), London (1200+ dentists), or NYC-scale coverage, you never see the long tail.
The new geoGridTiles input (1–10, default 1) splits the search area into an N×N geographic grid, issues the same keyword query against each tile's map viewport, and de-duplicates overlapping results by placeId so the final dataset is a single clean list.
Recommended grid sizes:
1 (default) → single viewport, ~120 max. Good for small towns + quick runs.
3 → 9 tiles, up to ~600 unique leads. Mid-size cities (Budapest, Lyon, Porto).
5 → 25 tiles, up to ~1500 unique leads. Big cities (Berlin, London, Chicago).
10 → 100 tiles, up to ~6000 unique leads. Mega-cities (NYC, Tokyo, Seoul).
How geocoding works: The query parser detects "{term} in {location}" / "{term} near {location}" / "{term} {location}" patterns, then calls free OpenStreetMap Nominatim to get the location's bounding box. No API key, no rate-limit budget to manage (we issue 1 geocode per query). If the location isn't parseable or Nominatim can't find it, the run falls back to an untiled single search — no silent failures.
How viewport-zoom is chosen: Each tile's edge length (in degrees) is mapped to a Google Maps zoom level via the empirically-calibrated formula zoom ≈ log2(0.7 / latSpanDeg) + 10, clamped to [3, 19]. This reproduces published benchmarks (zoom 13 ≈ 15 km edge, zoom 14 ≈ 7 km, zoom 15 ≈ 3 km, zoom 16 ≈ 1.5 km) within ±0.5 zoom.
Billing is linear in unique leads, not in tiles. 5×5 tiles discovering 1500 unique places costs $7.50 + $0.00005 — same per-lead rate as a 1-tile run. You pay for data, not for the crawl budget.
🔑 Google CID extraction + cross-tile dedup
New cid field on every dataset record — the stable numeric Customer ID Google uses internally for a business. Extracted via two paths:
?cid=<DECIMAL> — share URLs and some redirects already expose it.
!1s0xHEX:0xHEX feature-ID token — second hex group converted to decimal via BigInt (64-bit safe).
Why it matters:
Cross-query dedup: The same business discovered via two different search terms ("pizza in Manhattan" + "italian restaurants Manhattan") always has the same cid. Use it as your primary key when merging multiple runs.
Share URL reconstruction: https://maps.google.com/?cid={cid} opens the place page regardless of language/region — the placeId hex-pair format sometimes drifts between Google runtime updates.
Tile dedup insurance: When geoGridTiles > 1, neighboring tiles frequently overlap. The SEARCH handler now maintains a run-scoped enqueuedPlaceIds Set that short-circuits the second enqueue of the same place — reported as [tile-dedup] Skipped N place(s) already enqueued from other tiles in the run log.
src/utils.js: extractCid(url) and cidFromPlaceId(placeId) helpers; buildResultItem now populates the cid field automatically (falls back to deriving from placeId if caller didn't supply it explicitly).
src/routes.js: CID extracted in LABELS.PLACE handler from both request.url and page.url() (Google redirects /maps/search/ → /maps/place/ mid-navigation). New getTileDedupCount() export for the post-run summary.
src/main.js: geoGridTiles destructured from input, SEARCH initial requests loop now uses expandQueryToTiles. Tile-aware uniqueKey protects against Crawlee collapsing distinct tile viewports. Post-run summary reports tileDeduped and withCid counts.
Added — dataset schema
cid field in storages.dataset.fields of .actor/actor.json.
Added — input schema
geoGridTiles integer field in INPUT_SCHEMA.json (1–10, default 1) with a detailed description including grid-size recommendations and the billing-is-linear-in-leads note.
Not changed (intentionally)
PAY_PER_EVENT schema — still apify-default-dataset-item $0.005 + apify-actor-start $0.00005. Tile runs don't introduce new events.
Delta mode (sinceDatasetId) — still dedups by placeId; cross-run cid-based dedup is deferred to a later version (would require backfilling CIDs in legacy datasets).
Preflight + auto-extend — runtime estimator doesn't yet account for tile multiplication; users setting geoGridTiles=5, maxResults=100 should expect up to 25× the single-tile runtime. Estimator update tracked for v1.3.1.
Migration notes
Existing runs are unaffected (geoGridTiles defaults to 1 = untiled behaviour).
Consumers relying on placeId as a primary key can continue to do so. cid is an additive field.
[1.2.15] — 2026-04-23
Changed — pricing schema aligned with Apify Console submission
The user submitted the PAY_PER_EVENT pricing through the Apify Console using the standard events, not the custom events defined in .actor/actor.json. Effective date: 2026-04-28 11:14 UTC (5-day notice period — standard for Free → PPE transitions, not the 14-day "major change" window).
Submitted events (going live Apr 28):
apify-default-dataset-item → $0.005 per dataset record
apify-actor-start → $0.00005 per run start
My old custom events in actor.json (never active, since Apify bills on the Console-submitted schema only):
place-scraped → $0.003
website-scraped → $0.002
Why this matters
Once the new pricing goes live on Apr 28, any Actor.charge({ eventName: 'place-scraped' }) or Actor.charge({ eventName: 'website-scraped' }) call would attempt to charge an event that isn't in the pricing schema — best case: no-op (free runs forever, revenue = $0), worst case: SDK throws and crashes the run. Either way broken.
Line 314: place-scraped (in requestHandler after PLACE requests)
Line 445: website-scraped (after scraping each business website)
Line 574: website-scraped (after Meta Ad Library Facebook page fetch)
Replacement mechanism: The platform now auto-fires apify-default-dataset-item for every record written by Dataset.pushData (batch loop at main.js:654). One push call with a 100-item batch = 100 billing events = $0.50. apify-actor-start fires once per run start automatically. Zero explicit Actor.charge calls needed.
Updated — .actor/actor.json pricingPerEvent
Replaced the custom-event schema with the Console-submitted one. eventDescription now documents the flat per-lead all-inclusive pricing, reinforcing the "no surprise bills, no per-subpage fees" message.
README.md — 💸 Transparent pricing callout rewritten for the $0.005 flat model; above-the-fold cost table updated ($0.10 → $0.125, $0.06 → $0.10, $0.07 → $0.10 for the small runs where the 2× rounding effect is visible). How much will it cost to scrape {city}? section rewritten with new formula (leads × $0.005) + $0.00005. Cost architecture note now emphasizes the flat-rate no-surprise-bills guarantee even when we crawl multiple contact subpages.
Not changed (intentionally)
Preflight refusal logic (v1.2.9) — still fires before any Dataset.pushData call, so on refusal the dataset stays empty and 0 events fire. "0 events billed" guarantee preserved.
Auto-extend logic (v1.2.12) — spawned runs still fire their own apify-actor-start ($0.00005 × 2 = $0.0001 total overhead if auto-extend kicks in), leads delivered on the spawned run fire apify-default-dataset-item normally.
Delta mode (sinceDatasetId) — duplicate filtering happens beforepushData, so no events fire for duplicates = $0 for already-known leads. Behaves exactly as before.
User-facing impact
Effective price is the same as before in the typical case (email scraping on): $0.005 per lead. Users who disabled scrapeEmails previously paid $0.003; now they pay $0.005 regardless (simpler mental model, still cheaper than any competitor selling per-email validation separately).
[1.2.14] — 2026-04-23
Added — Tier 1 market-leader setup (from 4-agent research synthesis)
SEO-optimized actor title: Google Maps Email Extractor & Lead Scraper (42 chars — fits SERP / Apify Store tile). Down from the 73-char marketing subtitle that Google truncated mid-word.
Sub-300-char description (API-enforced limit): hits the pain points Apify users search for — validated emails, delta mode, lead scoring, AI-ready profiles, PAY_PER_EVENT no-charge-on-failure.
5 dataset views in .actor/actor.json — filtered presets the user can switch between in the Apify Console dataset UI, without writing any code:
overview — core business details (name, address, phone, email, rating)
webAgency — web-quality / tech-stack targeting for redesign pitches
socialAds — Facebook page ID + Meta Ad Library URL for active-advertiser filtering
restaurantsFocus — cuisine, price level, booking, menu, service options
README restructure for Apify Store SEO + first-click conversion:
Status badge at top (apify.com/actor-badge) — visually signals "living, maintained actor"
Above-the-fold 5 unfair-advantage bullets: email deliverability grading, delta mode, Meta Ads inline, lead score + AI profile, multilingual contact-page crawl (10+ languages)
New loud PAY_PER_EVENT transparency callout with 🛡️ "Run failed before any lead was delivered? You pay $0" — directly addresses the #1 Store-reviewer deal-breaker (hidden charges / timeouts that still cost money)
New "How much will it cost to scrape {city}?" SEO section with exact formula Total = (places × $0.003) + (websites × $0.002) and 9-row cost-per-city table (Budapest / Berlin / London / NYC / Chicago / LA / Sydney). Targets long-tail SEO queries.
Input-fields table expanded with autoExtend, validateEmails, extractWebSignals, enrichMetaAds, sinceDatasetId — previously buried in INPUT_SCHEMA only.
maxResults default updated from 100 to 10 in the table to match the fast-first-run v1.2.11 change.
Infrastructure / discovery signals
Apify Store search shows the actor as notice: UNDER_MAINTENANCE (QA flag from 1.2.11 email). Next QA cycle against v1.2.13 (maxResults.default=10, 113s real runtime vs 5-min QA budget = 62% margin) should clear the flag within ~24 h.
Apify Store lists currentPricingInfo: {pricingModel: "FREE"} — this is because PAY_PER_EVENT activation requires a manual Console submission (actor.json alone is insufficient). The 14-day notice period kicks in only after Console submission. Flagged for user action; docs + action steps included in the session report.
[1.2.12] — 2026-04-23
Added
Zero-touch timeout UX — autoExtend input flag (default true). The user no longer has to know the "timeout" concept exists. When enabled and the preflight detects the run timeout is too short for maxResults:
The actor uses apify-client to start a fresh run of itself with the same input but a sufficient timeout (estRuntime × 1.3, rounded up to the next hour).
The original run exits SUCCEEDED with a status message pointing to the new run's dataset URL.
The spawned run receives autoExtend: false to prevent cascading loops.
Billing is unchanged — PAY_PER_EVENT charges only for leads actually delivered, regardless of which run delivered them.
When autoExtend: false OR the spawn API call fails, falls back to the existing v1.2.9 preflight refusal with the detailed 4-step error message (now prefixed with [0] ⭐ EASIEST: set autoExtend: true).
Changed
defaultRunOptions.timeoutSecs: 7200 → 14400 (2h → 4h) via platform API PUT. The 4-hour default covers every possible maxResults up to our 1000 cap with massive headroom (worst-case full-enrichment = ~1 hour real runtime). Combined with autoExtend, the user literally never needs to see or touch a timeout field.
Three-layer defense in depth now complete:
Layer 1: 4h default covers 99% of users on the "just click Start" path.
Layer 2: autoExtend: true (default) handles the 1% where timeout was manually lowered (e.g., Apify QA's forced 5-min tests).
Layer 3: Preflight refusal as final safety net if autoExtend: false or the spawn API fails. Even then, 0 events billed.
[1.2.11] — 2026-04-23
Fixed
Apify automated QA was flagging the actor "Under maintenance". Email received 2026-04-23 from Apify: "your Actor did not pass our automated quality assurance tests during the last three days". Apify's QA system runs every Store actor on the prefill input with a 5-minute timeout. Our prefill had maxResults: 100 + all enrichments on — estimated 3.2 min, but real-world Google Maps scroll + detail-page loads + slow third-party websites pushed actual runtime over 5 min in 7 of 27 external runs (26% TIMED-OUT). Public stats confirmed: publicActorRunStats30Days: {SUCCEEDED:18, TIMED-OUT:7, FAILED:1, ABORTED:1}.
maxResults.default: 100 → 10. At 10 leads × 8s × ÷5 concurrency + 30s overhead ≈ 46s — fits comfortably in QA's 5-min window with huge margin for slow websites. Power users wanting 100–1000 just type a new value in the input form; the preflight check (v1.2.7+) protects them from setting it too high for their timeout.
Description rewritten to frame 10 as a "fast first-run for evaluation" and explicitly mention bumping to 100–1000 for production. Clarified timeout UI path: Start new run modal → ⚙ Options → Timeout.
Why this matters
"Under maintenance" flag hides the actor from Store search, killing organic discovery. Restoring passing QA is mission-critical — the 3 external users/week discovery rate vanishes while flagged. Next QA cycle (within ~24h of push) should lift the flag automatically.
[1.2.9] — 2026-04-20
Improved
Preflight error UX — user-language, explicit "YOUR input". First field trial of v1.2.7/8 surfaced that the log said "Estimated runtime exceeds the run timeout" (passive, technical). Non-technical users might assume the actor is broken. Rewritten to:
Open with "⛔ YOUR INPUT WON'T FIT IN THE RUN TIMEOUT — STOPPED BEFORE ANY CHARGES" (active voice, ownership on user's settings).
Show user's own numbers (maxResults: 500 → needs ~28 min, Your run timeout: 5 min, Gap: ~23 min) so they immediately connect the failure to what they entered.
Lead with the money reassurance: "Zero events billed. Your Apify credit is untouched."
4 labeled fix-paths [A]/[B]/[C]/[D] with dynamic values: Lower from 500 to ≤ 79, Raise timeout to 3600s, Split into 7 runs of 79, Disable validateEmails (−1s/lead).
Enrichment-disable tips are conditional — only shown for flags currently ON.
Footer: "This is a safety check, not a bug."
Status message (the one line in the dashboard red banner) rewritten from "Preflight failed: estimated runtime exceeds run timeout" to a self-contained human sentence: "Your input (maxResults=500) needs ~28 min but the run timeout is only 5 min. Stopped before charging you — 0 events billed. See log for 3-4 one-click fixes."
[1.2.8] — 2026-04-20
Fixed
Preflight refusal now marks run as FAILED (not SUCCEEDED). v1.2.7 shipped with Actor.exit(1) which, counter-intuitively, still resolves the run as status: SUCCEEDED, exitCode: 0 on Apify. Users would see a "green checkmark" run with zero results — indistinguishable from a genuinely empty dataset. Swapped to Actor.fail(statusMessage). Now:
Run list shows red FAILED badge.
Dashboard header surfaces the status message directly.
Aggregate publicActorRunStats30Days.FAILED counter separates preflight refusals from real timeouts in our observability.
CLI exits with Error: Actor failed! instead of Success: Actor finished.
[1.2.7] — 2026-04-20
Added
Preflight timeout refusal. After pushing v1.2.6 with the 7200s default timeout fix, we observed a fresh external TIMED-OUT event at 2026-04-19 20:52 UTC — the fix alone wasn't sufficient. Root cause: users can (1) override the actor timeout per-run, (2) request a maxResults so large even 2h isn't enough, (3) pin an older build. The estimator warned about this but didn't stop the run.
The actor now reads APIFY_TIMEOUT_AT env var (platform-set deadline) and compares it to the runtime estimate.
If estRuntime > availableTime, the actor exits with code 1 before scraping a single page — zero place-scraped / website-scraped events charged.
The error message surfaces 4 concrete fixes with the exact numbers: how many seconds to raise the timeout to, what maxResults would fit safely, how to split into multiple runs with sinceDatasetId, and which enrichment flags to disable.
Preflight success also logs a ✅ Budget OK line so users see explicit confirmation that the run will fit.
Result: users discover the misconfiguration in 30 seconds instead of burning 1-2 hours on a run that's mathematically guaranteed to fail.
Philosophy
Fast failure > slow degradation. A user who sees a clear preflight error and fixes their input gets value 5 minutes later. A user whose run silently trucks along for 2 hours before timing out leaves permanently. This is worth the small cost of the "I know what I'm doing, just run it" escape-hatch case (which doesn't exist in our v1.2.x user base per the observed data).
[1.2.6] — 2026-04-19
Fixed
TIMED-OUT runs. Of the first 14 external runs on the Apify Store, 3 hit the default 1-hour timeout (21% abort rate). Root cause: users requesting 500+ leads with full enrichment couldn't complete in 3600s. Fixes:
.actor/actor.jsondefaultRunOptions.timeoutSecs bumped from 3600 to 7200 (2 hours). Covers ~900 leads at default concurrency.
Runtime estimator at actor startup. Computes expected runtime based on maxResults × searchQueries.length × feature flags, logs it on startup, and emits a ⚠️ warning if the estimate exceeds the default timeout — telling the user exactly how much to raise it (and why).
INPUT_SCHEMA maxResults description updated with timeout guidance so users see it before they launch a 1000-lead job.
Added
Post-run CTA log. At the end of every successful run (results > 0), the actor prints a clean footer with:
Two actionable tips (filter by deliverability: "high", use sinceDatasetId for next run)
⭐ Rate-the-actor link — no popups, no emails, shown once after value was delivered
README header CTA — concise bookmark / rate-the-actor line at the top.
Why these changes
First-cohort signal (14 external runs, 3 timeouts) was strong enough to warrant a UX fix rather than just a doc update. New estimator means a 2000-lead user now sees "⚠️ estimated 3.5 hours, current timeout 2 hours, raise to 14400s" in the first 5 seconds of the run — not 2 hours into a hung job.
[1.2.4] — 2026-04-18
Added
Multi-vertical demo gallery. Three public datasets covering distinct industries and countries to demonstrate real-world hit rates across markets:
🗽 NYC Italian restaurants (M9Bd8gMh4NglVKIbt) — 64% email, 56% ownerName
🦷 Berlin dentists (gI04MuKrfPF4D4Ui8) — 85% email, 65% high-deliverability
marketing/showcase.md — cross-vertical dataset gallery with metrics breakdown and "how to reproduce" inputs.
Updated marketing/blog-post.md to reflect v1.2 features (deliverability grading, web signals, delta mode, Meta Ad Library).
Observation
Regulated markets (e.g. German medical professionals under GDPR + Heilmittelwerbegesetz) show 1.5-2× higher email hygiene and deliverability grades than consumer-facing verticals. This cross-vertical data is now visible in the README landing.
[1.2.3] — 2026-04-18
Added
Legal & Compliance section in README. GDPR data classification table, Legitimate Interest Assessment template, jurisdictional references (US, EU, UK, Hungary). No other Google Maps scraper on the Apify Store ships documentation this detailed.
[1.2.2] — 2026-04-18
Fixed
Google redirect URL pollution. normaliseWebsite() now unwraps https://www.google.com/url?q=<real>&... wrapper URLs emitted by Google Maps, so downstream phases (email extraction, web signal analysis, email validation) operate on the real business domain. Previously, ~40% of results were scraping google.com instead of the real website — including false press@google.com primary emails.
[1.2.0] — 2026-04-18
Added
Email deliverability grading (emailValidation field). Every primaryEmail is graded via MX / SPF / DMARC DNS lookups + best-effort SMTP RCPT TO probe. Output: {mxRecords, hasSpf, hasDmarc, smtpValid, isCatchAll, deliverability} with grade "high" / "medium" / "low" / "unknown". Agencies and cold-outreach teams can now skip invalid / catch-all addresses before burning sender reputation.
Web quality signals (webSignals + webQuality fields). Lightweight "Lighthouse without Lighthouse" — extracted from the HTML already fetched for email scraping, so zero extra cost. Outputs: httpsOnly, mobileResponsive, pageSizeKb, hasFavicon, hasOpenGraph, hasStructuredData. Perfect for web-dev agency outreach targeting.
Node built-in dns/promises + net — no new dependencies.
SMTP probe gracefully degrades to DNS-only grading when port 25 is blocked by cloud egress firewalls (as on Apify infra). DNS signals alone cover ~80% of the deliverability picture.
Per-domain DNS cache — repeated probes on the same domain don't re-query.
[1.1.1] — 2026-04-18
Added
Delta mode (sinceDatasetId input). Pass a previous run's dataset ID and the actor skips every place already present (matched by Google Maps placeId). Workflow win for scheduled runs: users pay only for new leads.
Meta Ad Library enrichment (enrichMetaAds input). For every business with a Facebook URL, the actor fetches the FB page, extracts the numeric page ID, and builds a targeted Meta Ad Library lookup URL (view_all_page_id=<ID> — ads from that specific page, not a noisy keyword search). Click-through reveals if the business is currently running Facebook/Instagram ads — a strong buying-intent signal.
New output fields: facebookPageId, metaAdLibraryUrl (always filled — page-specific when pageId extractable, keyword fallback otherwise).
Implementation notes
placeId Set with lowercase normalization for robust matching across runs.
Delta filtering happens BEFORE detail-page enqueue, so no place-scraped events fire for skipped duplicates = users pay nothing for already-known leads.
[1.0.24] — 2026-04-18
Fixed
reviewKeywords PUA (Private Use Area) leak. Material Icons ligatures in Unicode range U+E000-U+F8FF were slipping through as "Sort", "All", etc. Added browser-side + Node-side filter: .replace(/[\uE000-\uF8FF]/g, '').
ownerName extraction rate raised from 0% to 40%. Updated NW regex pattern to accept Mc/Mac/Van prefixes and hyphenated compound names. Previously "McDonald" was rejected because of the internal capital D; now "Pam Weekes & Connie McDonald" (Levain Bakery) and "Jatee Kearsley" (Je T'aime Patisserie) extract correctly.
[1.0.23] and earlier
Initial feature set: Google Maps scraping, email extraction from business websites (5-source with ranking), phone/WhatsApp, social media links, tech-stack detection (WordPress/Wix/Shopify/React/Analytics), lead scoring 0-100, hidden gem score, growth signal, budget tier inference, AI-ready outreach profile, suggested cold-outreach opener, Cloudflare email decoding, ROT13 deobfuscation, JSON-LD parsing, contact-page crawl in 10 languages, industry/cuisine classification, booking URL detection (OpenTable/Resy/Tock), website language detection.
Versioning scheme
Minor (1.x.0) — new features
Patch (1.x.y) — bug fixes / docs updates
Apify auto-increments the patch on every apify push within the same actor.json version.