All notable changes to this Apify Actor. Built with transparent development — every version and its tradeoffs documented.
- Preflight error UX — user-language, explicit "YOUR input". First field trial of v1.2.7/8 surfaced that the log said "Estimated runtime exceeds the run timeout" (passive, technical). Non-technical users might assume the actor is broken. Rewritten to:
- Open with "⛔ YOUR INPUT WON'T FIT IN THE RUN TIMEOUT — STOPPED BEFORE ANY CHARGES" (active voice, ownership on user's settings).
- Show user's own numbers (
maxResults: 500 → needs ~28 min, Your run timeout: 5 min, Gap: ~23 min) so they immediately connect the failure to what they entered.
- Lead with the money reassurance: "Zero events billed. Your Apify credit is untouched."
- 4 labeled fix-paths
[A]/[B]/[C]/[D] with dynamic values: Lower from 500 to ≤ 79, Raise timeout to 3600s, Split into 7 runs of 79, Disable validateEmails (−1s/lead).
- Enrichment-disable tips are conditional — only shown for flags currently ON.
- Footer: "This is a safety check, not a bug."
- Status message (the one line in the dashboard red banner) rewritten from
"Preflight failed: estimated runtime exceeds run timeout" to a self-contained human sentence: "Your input (maxResults=500) needs ~28 min but the run timeout is only 5 min. Stopped before charging you — 0 events billed. See log for 3-4 one-click fixes."
- Preflight refusal now marks run as FAILED (not SUCCEEDED). v1.2.7 shipped with
Actor.exit(1) which, counter-intuitively, still resolves the run as status: SUCCEEDED, exitCode: 0 on Apify. Users would see a "green checkmark" run with zero results — indistinguishable from a genuinely empty dataset. Swapped to Actor.fail(statusMessage). Now:
- Run list shows red FAILED badge.
- Dashboard header surfaces the status message directly.
- Aggregate
publicActorRunStats30Days.FAILED counter separates preflight refusals from real timeouts in our observability.
- CLI exits with
Error: Actor failed! instead of Success: Actor finished.
- Preflight timeout refusal. After pushing v1.2.6 with the 7200s default timeout fix, we observed a fresh external TIMED-OUT event at 2026-04-19 20:52 UTC — the fix alone wasn't sufficient. Root cause: users can (1) override the actor timeout per-run, (2) request a
maxResults so large even 2h isn't enough, (3) pin an older build. The estimator warned about this but didn't stop the run.
- The actor now reads
APIFY_TIMEOUT_AT env var (platform-set deadline) and compares it to the runtime estimate.
- If
estRuntime > availableTime, the actor exits with code 1 before scraping a single page — zero place-scraped / website-scraped events charged.
- The error message surfaces 4 concrete fixes with the exact numbers: how many seconds to raise the timeout to, what
maxResults would fit safely, how to split into multiple runs with sinceDatasetId, and which enrichment flags to disable.
- Preflight success also logs a
✅ Budget OK line so users see explicit confirmation that the run will fit.
- Result: users discover the misconfiguration in 30 seconds instead of burning 1-2 hours on a run that's mathematically guaranteed to fail.
- Fast failure > slow degradation. A user who sees a clear preflight error and fixes their input gets value 5 minutes later. A user whose run silently trucks along for 2 hours before timing out leaves permanently. This is worth the small cost of the "I know what I'm doing, just run it" escape-hatch case (which doesn't exist in our v1.2.x user base per the observed data).
- TIMED-OUT runs. Of the first 14 external runs on the Apify Store, 3 hit the default 1-hour timeout (21% abort rate). Root cause: users requesting 500+ leads with full enrichment couldn't complete in 3600s. Fixes:
.actor/actor.json defaultRunOptions.timeoutSecs bumped from 3600 to 7200 (2 hours). Covers ~900 leads at default concurrency.
- Runtime estimator at actor startup. Computes expected runtime based on
maxResults × searchQueries.length × feature flags, logs it on startup, and emits a ⚠️ warning if the estimate exceeds the default timeout — telling the user exactly how much to raise it (and why).
- INPUT_SCHEMA
maxResults description updated with timeout guidance so users see it before they launch a 1000-lead job.
- Post-run CTA log. At the end of every successful run (results > 0), the actor prints a clean footer with:
- Success summary (leads delivered, high-deliverability count, modern-sites count)
- Two actionable tips (filter by
deliverability: "high", use sinceDatasetId for next run)
- ⭐ Rate-the-actor link — no popups, no emails, shown once after value was delivered
- README header CTA — concise bookmark / rate-the-actor line at the top.
- First-cohort signal (14 external runs, 3 timeouts) was strong enough to warrant a UX fix rather than just a doc update. New estimator means a 2000-lead user now sees "⚠️ estimated 3.5 hours, current timeout 2 hours, raise to 14400s" in the first 5 seconds of the run — not 2 hours into a hung job.
- Multi-vertical demo gallery. Three public datasets covering distinct industries and countries to demonstrate real-world hit rates across markets:
- 🗽 NYC Italian restaurants (
M9Bd8gMh4NglVKIbt) — 64% email, 56% ownerName
- ☕ London coffee shops (
ROgK5EsNU6UtTSwFl) — 70% email, 79% FB page IDs
- 🦷 Berlin dentists (
gI04MuKrfPF4D4Ui8) — 85% email, 65% high-deliverability
marketing/showcase.md — cross-vertical dataset gallery with metrics breakdown and "how to reproduce" inputs.
- Updated
marketing/blog-post.md to reflect v1.2 features (deliverability grading, web signals, delta mode, Meta Ad Library).
- Regulated markets (e.g. German medical professionals under GDPR + Heilmittelwerbegesetz) show 1.5-2× higher email hygiene and deliverability grades than consumer-facing verticals. This cross-vertical data is now visible in the README landing.
- Legal & Compliance section in README. GDPR data classification table, Legitimate Interest Assessment template, jurisdictional references (US, EU, UK, Hungary). No other Google Maps scraper on the Apify Store ships documentation this detailed.
- Google redirect URL pollution.
normaliseWebsite() now unwraps https://www.google.com/url?q=<real>&... wrapper URLs emitted by Google Maps, so downstream phases (email extraction, web signal analysis, email validation) operate on the real business domain. Previously, ~40% of results were scraping google.com instead of the real website — including false press@google.com primary emails.
- Email deliverability grading (
emailValidation field). Every primaryEmail is graded via MX / SPF / DMARC DNS lookups + best-effort SMTP RCPT TO probe. Output: {mxRecords, hasSpf, hasDmarc, smtpValid, isCatchAll, deliverability} with grade "high" / "medium" / "low" / "unknown". Agencies and cold-outreach teams can now skip invalid / catch-all addresses before burning sender reputation.
- Web quality signals (
webSignals + webQuality fields). Lightweight "Lighthouse without Lighthouse" — extracted from the HTML already fetched for email scraping, so zero extra cost. Outputs: httpsOnly, mobileResponsive, pageSizeKb, hasFavicon, hasOpenGraph, hasStructuredData. Perfect for web-dev agency outreach targeting.
- Input flags:
validateEmails (default true), extractWebSignals (default true).
- Node built-in
dns/promises + net — no new dependencies.
- SMTP probe gracefully degrades to DNS-only grading when port 25 is blocked by cloud egress firewalls (as on Apify infra). DNS signals alone cover ~80% of the deliverability picture.
- Per-domain DNS cache — repeated probes on the same domain don't re-query.
- Delta mode (
sinceDatasetId input). Pass a previous run's dataset ID and the actor skips every place already present (matched by Google Maps placeId). Workflow win for scheduled runs: users pay only for new leads.
- Meta Ad Library enrichment (
enrichMetaAds input). For every business with a Facebook URL, the actor fetches the FB page, extracts the numeric page ID, and builds a targeted Meta Ad Library lookup URL (view_all_page_id=<ID> — ads from that specific page, not a noisy keyword search). Click-through reveals if the business is currently running Facebook/Instagram ads — a strong buying-intent signal.
- New output fields:
facebookPageId, metaAdLibraryUrl (always filled — page-specific when pageId extractable, keyword fallback otherwise).
placeId Set with lowercase normalization for robust matching across runs.
- Delta filtering happens BEFORE detail-page enqueue, so no
place-scraped events fire for skipped duplicates = users pay nothing for already-known leads.
reviewKeywords PUA (Private Use Area) leak. Material Icons ligatures in Unicode range U+E000-U+F8FF were slipping through as "Sort", "All", etc. Added browser-side + Node-side filter: .replace(/[\uE000-\uF8FF]/g, '').
ownerName extraction rate raised from 0% to 40%. Updated NW regex pattern to accept Mc/Mac/Van prefixes and hyphenated compound names. Previously "McDonald" was rejected because of the internal capital D; now "Pam Weekes & Connie McDonald" (Levain Bakery) and "Jatee Kearsley" (Je T'aime Patisserie) extract correctly.
- Initial feature set: Google Maps scraping, email extraction from business websites (5-source with ranking), phone/WhatsApp, social media links, tech-stack detection (WordPress/Wix/Shopify/React/Analytics), lead scoring 0-100, hidden gem score, growth signal, budget tier inference, AI-ready outreach profile, suggested cold-outreach opener, Cloudflare email decoding, ROT13 deobfuscation, JSON-LD parsing, contact-page crawl in 10 languages, industry/cuisine classification, booking URL detection (OpenTable/Resy/Tock), website language detection.
- Minor (1.x.0) — new features
- Patch (1.x.y) — bug fixes / docs updates
- Apify auto-increments the patch on every
apify push within the same actor.json version.