Stop wasting your budget on slow, resource-heavy browser-based scrapers. This is the fastest, most cost-effective, and data-rich Google Maps scraper on Apify, designed for high-scale lead generation and market research.
API inputs are now normalized defensively: string values for list fields
such as searchStringsArray, placeIds, startUrls,
additionalLanguages, and categoryFilterWords are converted into usable
lists instead of being interpreted character by character.
String booleans such as "false" and "true" are now parsed correctly, so
integrations that send form values as strings do not accidentally enable
expensive options.
Numeric knobs now default or clamp safely when malformed values are passed,
preventing bad JSON/API inputs from turning into platform-level run errors.
Runs that complete with zero pushed places now write a clearer
INPUT_NEEDS_ATTENTION summary explaining that no places were found and
suggesting broader terms, location checks, or looser filters.
Run summaries and Apify logs now include concrete diagnostic hints, for
example whether Google returned no raw places, results were outside the
resolved area, active filters removed everything, or email-only mode removed
places without discovered emails.
[1.5] - 2026-05-25
Lead generation limit accuracy
Email-only lead runs (skipPlacesWithoutEmail=true) now apply the
per-search result limit after contact enrichment has confirmed an email.
This prevents runs from finishing below maxCrawledPlacesPerSearch because
no-email candidates temporarily occupied limit slots.
If skipPlacesWithoutEmail=true is used while website contact enrichment is
off, the Actor now reports a clear inputWarning and ignores the email-only
filter instead of silently returning places without emails.
[1.5] - 2026-05-24
Search reliability and input guardrails
maxCrawledPlacesPerSearch is now enforced globally across concurrent
viewport tasks. Dense/subdivided runs no longer push more rows than the
user-requested per-search limit, including extra language passes.
Runs with missing targets now write status: INPUT_NEEDS_ATTENTION and a
clear inputWarning in the OUTPUT key instead of looking like a clean empty
scrape.
Search terms without a Location now get a specific warning explaining that
the city or region must be entered in locationQuery, geolocation fields,
or customGeolocation.
Geo-strict runs now expose search stats in OUTPUT, including out-of-area,
filtered, no-email, and limit-dropped counts. Strong seed-level geographic
mismatches turn the run summary into PARTIAL or INPUT_NEEDS_ATTENTION.
Automatic zoom selection is more robust for elongated OpenStreetMap
administrative bboxes, such as Brazilian coastal states/cities with offshore
islands in their bbox. This keeps Google Maps searches focused while
preserving the original bbox as the geo-strict guardrail.
customGeolocation Point and Polygon inputs now carry a bbox into the
search pipeline, so geo-strict filtering also protects custom areas.
[1.5] - 2026-05-23
Reviews - up to 1000 per place
maxReviewsPerPlace now supports up to 1000 reviews per place.
Reviews are fetched through Google's paginated review feed instead of the
old inline preview sample, so users can get hundreds of real review rows
when Google exposes them for the place.
Each review keeps the correct author/text/rating pairing, including
reviewId, author metadata, relative publish time, text, rating when
exposed, and attached photos when available.
Large review limits use more Google/proxy requests and can take longer;
set maxReviewsPerPlace to 0 for the fastest normal place scraping.
All notable changes to this Actor are documented here. Versions follow the
MAJOR.MINOR scheme used by Apify builds.
[1.5] — 2026-05-17
Five targeted fixes for silent data-quality and runtime issues surfaced
by production scrapes across the U.S., Indonesia, and South-East Asia.
All fixes are covered by a new 103-case regression suite that imports
the real src/ modules (not mocks).
Email extractor — false-positive purge
Bare-word at deobfuscation no longer creates fake emails.
Phrases like "shop online at www.aaa.com ", "available only at
savagex.com", or "Order at amazon.com" used to be rewritten as
online@www.aaa.com / only@savagex.com / Order@amazon.com and
scraped as real contacts. The deobfuscator now only triggers when the
local-part has email-like signals (internal ./-/_/+, a digit,
or a known role name like info/sales/team) AND isn't an English
filler / imperative verb (order, buy, visit, us, goods, …).
Non-business email domains rejected — registrar / CMS / CDN /
font-attribution domains that masquerade as contact addresses on
freshly-launched or platform-hosted sites: godaddy.com,
wordpress.com, wix.com, squarespace.com, hostinger.com,
latofonts.com, fontawesome.com, fonts.gstatic.com,
cloudinary.com, etc. Catches the filler@godaddy.com /
team@latofonts.com / blog@wordpress.com family of artefacts.
@www.<host> domains rejected — emails published at the literal
www. subdomain are virtually always Linktree-style mis-parses;
real mailboxes live at the apex.
Placeholder locals rejected — filler@, placeholder@,
noreply@, no-reply@, donotreply@ are now dropped regardless of
domain (they're never useful for outreach).
Default bbox tolerance dropped from a fixed 0.5° (~55 km) — which
made geo-strict mode almost useless for city-sized bboxes — to an
adaptive rule: 10 % of the bbox span on each axis, clamped to
[0.005°, 0.05°] (~0.5–5 km). With the old default, a Waterbury-CT
query routinely leaked Hartford (30 km) and beyond; with the new one,
a tight West-Java bbox correctly rejects all 64 Singapore and 12
Malaysia false-positives from the previous run.
Pipeline saturation heuristic — no more cascades into empty space
Subdivision now requires at least 2 places that passed the bbox
check (kept + filtered + no-email-dropped) before splitting. The
prior heuristic used raw "unique placeIds returned by Google", which
caused the pipeline to recursively explode 80→320→1280 viewports in
thin-niche searches where every result was Google's regional fallback
(out-of-bbox). Real-world impact: a single "Cabinet Refinishing in
Evansville, IN" run dropped from 422 seconds @ 7.7 places/min to
the expected ≈60-second / 50+ places-per-minute range.
Concurrency — pre-subdivide seeds when under-utilised
The pipeline pool now seeds 5 viewport tasks instead of 1 when the
natural terms × seeds × langs cross-product is smaller than the
worker concurrency (typical: one search term + one seed location). All
workers start busy from t=0 instead of seven idling while the lone
seed serially fetches its 80 SSR results. Single-term runs that
previously throttled at ~25 places/min should now run at ~70–90.
HTTP — fast-fail on proxy-anchored errors in one-shot get()
BoringSSL BAD_DECRYPT / CONNECT tunnel failed errors are
anchored to a single bad upstream IP; burning 4 retries against the
same sticky session can't recover them. The sticky session.get
path already short-circuited these; the one-shot GMapsHTTP.get
now does the same. Saves ~20–60 s on degraded Apify residential
pool windows.
Tests
New regression suite under scripts/test_fixes.py (103 unit cases
importing the real src/ modules) and scripts/test_against_real_dataset.py
(end-to-end against a captured West-Java run with 552 places).
[1.4] — 2026-05-13
Per-place reviews + a sweep of data-quality fixes.
The Actor now extracts each place's top reviews via Google's
/maps/preview/place SSR XHR — the same call its own JavaScript fires
on first paint of a place card. Per review you get reviewId, text,
originalLanguage, relative + ISO-8601 publish dates, reviewer name +
avatar + profile URL + numeric ID, total review/photo counts, Local-
Guide badge, and attached photo URLs. ~5–10 reviews per place — that's
all Google ships inline; the rest are behind a browser-tokened endpoint
that requires JavaScript-synthesised session tokens we can't reach
HTTP-only.
Per-review star rating is also browser-tokened and not exposed —
only the aggregate totalScore for the place is available. The actor
hedges with two URL variants (original + viewport-rewritten) plus one
retry each to absorb Google's random "lite-response" returns, getting
to roughly 80 % per-run coverage. Disabled by default (set
maxReviewsPerPlace > 0 to enable). Requires Apify residential proxy.
Data-quality fixes
additionalInfo deduplication. Google occasionally emitted both
an enum-id and a free-text entry rendering to the same label inside a
section (e.g. Payments listed Credit cards twice). The parser now
dedupes per section; ~27 % of places in dense urban runs were
affected.
Compound-city addresses split. Place tuples in Turkey and some
other locales put organised-zone names in the city slot using a
slash separator (e.g. "Büsan OSB/Karatay"). We now keep the trailing
municipality as city and roll the rest into neighborhood.
Phone numbers normalised to E.164. A few records came back with
local-trunk format (e.g. "(0332) 221 52 52" instead of
"+90 332 221 52 52"
). Phones are now promoted to E.164 for every country with
a known prefix.
Postal-code sanity flag. Records whose postal code's province
prefix doesn't match the parsed Turkish state get
postalCodeSuspect: true (Google's own data has the wrong digit
here — about 0.5 % of places).
Reverse-geocode fallback. Places Google returns with valid coords
but no address string at all (about 0.5 % of dense urban results) now
get backfilled from OpenStreetMap (Nominatim → Photon). Adds the
addressSource: "reverse_geocode" marker so consumers can tell.
Opt-out via reverseGeocodeMissingAddress=false.
Reliability fixes
Short-circuit retries on dead proxy IPs. Sticky-session GETs that
hit BoringSSL bad_decrypt / WRONG_VERSION_NUMBER /
CONNECT tunnel failed(595)
no longer burn all 4 retry slots on a stuck IP — they
surface fast and the pipeline mints a fresh session_id.
Pipeline-level proxy failover. A RequestsError propagated out of
a viewport task now triggers a single retry with a brand-new
session_id, instead of silently dropping the viewport. Previously,
one stuck IP could lose up to ~80 places per failed task.
safe() no longer indexes into strings. A latent parser bug —
when Google's protobuf drifted and a str landed where a list was
expected, safe(x, i) would return a single character (turning a
reviewer name into the letter "p", for example). The helper now
treats strings as leaves.
[1.3] — 2026-05-05
Drop residential-proxy traffic for non-Google requests
Up to v1.2 every HTTP — including Nominatim/Photon geocoding and arbitrary
business-website fetches during enrichment — went through Apify residential
proxy ($0.0008/MB). Most of those calls don't need it: free public APIs
don't anti-bot, and small-business websites rarely block datacenter IPs.
Fix: the actor now creates two HTTP clients:
http — residential proxy (user-configured), used for Google search XHR
SSR fetches that genuinely need real-IP routing.
http_direct — no proxy, direct from the Apify worker. Used for
Nominatim/Photon geocoding and the website-enrichment fetches.
Net effect on a typical run with extractContactsFromWebsite=true:
60-90% of HTTP requests no longer use residential proxy. Estimated savings:
**$0.10-0.15 per 1 000 places in op cost**.
Edge case: if a business website blocks the Apify worker's datacenter IP
(rare), the enrichment for that one site silently skips (we already
max_retries=1 for fail-fast website fetches). Other places are unaffected.
[1.2] — 2026-05-05
Lower memory floor (128 MB) — cheaper runs
Memory footprint measured with psutil:
Light run (5 terms × subdivision × 415 places, no enrichment): peak 80 MB
Medium run (50 places + full website enrichment, concurrency=8): peak 109 MB
Both well under 128 MB. The previous minMemoryMbytes: 256 was unnecessarily
high — frugal users couldn't pick the cheapest tier. Updated:
minMemoryMbytes: 128 (was 256) — opt-in for small runs
maxMemoryMbytes: 4096 (unchanged) — for city-scale jobs
At 128 MB on Apify:
Compute cost ~50% lower vs 256 MB
Same throughput for ≤ 100-place runs
For city-scale (1000+ places) prefer 512 MB to stay safe
[1.1] — 2026-05-04
Critical fix: strict geographic match (drop places from wrong country)
A real production run searching karnataka, India for school /
high school / pre university etc. returned 120 places of which only 3
were actually in Karnataka. The rest:
80 places from Texas, USA (Arlington, Fort Worth, Dallas)
20 places from Cantabria, Spain
11 from Andhra Pradesh (neighbouring Indian state)
1 from South Korea, 1 from Cambodia, 4 from Tamil Nadu / Maharashtra
Two compounding bugs:
Subdivision math broke at low zoom. The previous formula gave
children a longitude offset of 360 / 2^z * 0.75 which at z=6 is 4.2°
— almost the full width of a typical state. Children's centers drifted
into the Arabian Sea and Bay of Bengal.
No bounding-box check. When Google's search XHR found nothing at the
drifted coordinates, it fell back to the residential-proxy IP's country
for results. Apify residential exits in Texas / Spain / Korea returned
schools in those regions instead of empty results.
Fixes:
Correct subdivision math: child longitude offset is now
360 / 2^(z+2) (= a clean quarter of the parent viewport), with
latitude scaled by cos(lat) for high-latitude correctness.
bbox capture + filter: Nominatim and Photon both expose the queried
region's bounding box. We now store it in Viewport.bbox, propagate it
to all subdivided children, and drop any place whose (lat, lng)
falls outside (with 0.5° tolerance for border cases).
New inputgeoStrictMatch: true (default ON). Set to false to
keep the v1.0 wider-area behavior.
Verified on the same karnataka, India query → only Karnataka places
returned, Texas/Spain/Korea results dropped.
[1.0] — 2026-05-03
Production launch. First stable, public-ready release of the HTTP-only
Google Maps scraper. No browser, no Chromium — just curl_cffi with Chrome
TLS impersonation.
Coverage
Quad-tree viewport subdivision. When a viewport saturates (≥18 of the
first 20 results are new), it splits into 4 child viewports at zoom+1 and
recurses up to maxSubdivisionDepth (default 4 → up to 256 viewports per
seed). This is what lets the Actor break Google's hard ~120-results-per-area
limit and scrape entire metro areas.
Multi-zoom expansion (multiZoomDelta) — search each seed at
zoom-N..zoom+N for +30-70% extra unique places.
Multi-language passes (additionalLanguages) — re-search the same
area in additional hl= codes to catch translations and regional categories.
Geo composite resolver — countryCode / state / county / city /
postalCode joined into a single Nominatim query when locationQuery is
empty.
Direct inputs — startUrls (/maps/place/... URLs) and placeIds
(raw ChIJ… IDs) bypass search entirely.
Output (~46 fields per place)
Place identifiers (placeId, fid, cid, kgmid), structured address
(addressParts.{street, city, state, postalCode, neighborhood, countryCode}),
center + entrance coordinates, contacts (phone, phoneUnformatted,
website, websiteDisplay), ratings (totalScore, reviewsCount for
hotels), opening status (openingHoursToday, currentStatus,
nextOpensAt, permanentlyClosed, temporarilyClosed), descriptions
(subTitle, description, longDescription), categories, owner info
(ownerName, ownerId, claimThisBusinessUrl), placeTags (LGBTQ+
friendly, women-owned, …), full additionalInfo amenities tree, imagesCount
thumbnail, menu URL, plusCode, locatedIn, isAdvertisement, hotel
block (hotelStars, hotelPrice, hotelCheckInDate/hotelCheckOutDate,
hotelAmenities), plus run metadata (scrapedAt, language, rank,
searchPageUrl).
Built-in filters (post-fetch, free)
placeMinimumStars — two / twoAndHalf / … / fourAndHalf.
skipClosedPlaces — drop permanently / temporarily closed.
searchMatching — all / only_includes / only_exact (title vs term).
categoryFilterWords — keep only matching categories.
Optional add-on: website-contacts enrichment
When extractContactsFromWebsite is enabled, the Actor visits each place's
website and extracts emails (with deobfuscation of foo (at) bar (dot) com
style writing), additionalPhones (from tel: links, normalized to E.164,
deduped against the main phone), and 8 social-media URL fields (facebooks,
instagrams, linkedIns, twitters, youtubes, tiktoks, pinterests,
whatsapps). Domain-level cache means chain stores share one fetch.
Optional /contact page fallback when the homepage yields no email. Big
global chains (McDonald's, Starbucks, Hilton, …) are skipped by default.
Quality filters tuned against real-world false positives:
Reject CMS-glued phone numbers (e.g. 60957293003 — 11 digits without +
and not starting with NANP 1 is junk).
Reject Facebook XML namespace URL (/2008/fbml) and bare profile.php
placeholders; keep only profile.php?id=NNN and vanity URLs.
Reject Pinterest conversion-tracking pixel (ct.pinterest.com/v3) and
any social handle matching API-version pattern (v1, v2, …).
Reject .php / .html / .aspx "vanity URLs" — real social handles
never carry file extensions.
Performance: enrichment runs in parallel within one task via
asyncio.gather. Fetches use max_retries=1 (fail-fast) since retrying a
403/timeout from a third-party site rarely helps — better to skip and move
on. Real platform measurement: 20 places + full enrichment in ~21 s on
residential proxy.
Reliability & speed
Sticky residential proxy sessions per viewport — all paginated XHRs of
one logical search hit the same Apify residential exit IP.
AsyncSession reuse across the pagination chain — single TLS handshake
per task, HTTP/2 multiplexed.
Chrome TLS impersonation rotated per session (Chrome 120/123/124/131
profiles).
EU consent flow bypassed via pre-set CONSENT=YES+cb and SOCS=…
cookies on every Google request — no more consent.google.com redirects.
BlockedError retry-with-fresh-IP — when a sticky session does get
challenged, the pipeline mints a brand-new session_id (different proxy
exit) and tries once more before giving up.
Bulletproof geocoding — Nominatim with 6 s cap, falling back to Photon
(komoot) which uses the same OpenStreetMap data through more reliable
infra. Cached in KV store under _geocode_cache so repeat runs are instant.
Consent / captcha / 429 detection with intelligent backoff;
fast-fail on deterministic 4xx (no retry storms).
Resumable across Apify migrations — state checkpointed every 30 s and
on PERSIST_STATE event.
Bounded concurrency worker pool with dynamic enqueue of subdivided
child viewports.
Pay-per-event monetization (3 events)
Event
When it fires
Suggested price
apify-default-dataset-item
every place pushed (Apify auto-charges)
$0.0010 ($1.00 / 1 000 results)
place-with-emails
website enrichment yielded ≥ 1 email
$0.0015 ($1.50 / 1 000)
place-with-socials
website enrichment yielded ≥ 1 social URL
$0.0005 ($0.50 / 1 000)
src/billing.py detects whether PPE is actually active for the current run
(via Configuration.actor_pricing_info) so local runs and free-tier runs
skip charging entirely — no log noise, no overhead.
Apify Store metadata
Input schema with grouped sections + select-list filters.