Stop wasting your budget on slow, resource-heavy browser-based scrapers. This is the fastest, most cost-effective, and data-rich Google Maps scraper on Apify, designed for high-scale lead generation and market research.
All notable changes to this Actor are documented here. Public builds use
Apify build revisions such as 1.6.0 and 1.5.65.
[1.6.46] - 2026-06-15
Strict search reliability
Fixed task completion bookkeeping after the strict subdivision efficiency
changes so non-splitting search tasks are still marked completed for
diagnostics and migration/resume safety.
[1.6.45] - 2026-06-15
Strict viewport splitting efficiency
Made automatic subdivision require saturation inside the resolved search
area relative to the configured maxPlacesPerViewport.
With the default 80 places per viewport, a task now needs about 60
local/target-area candidates before it creates child viewports. A small
number of local places plus a long nearby-city or duplicate tail no longer
triggers four extra searches.
This reduces raw Google Maps requests in strict multi-location runs while
preserving deep subdivision for genuinely dense target areas.
[1.6.44] - 2026-06-15
Multi-location strict search efficiency
Added per-search-area deduplication for places that geoStrictMatch=true
drops as outside the resolved area.
In multi-location runs, the same nearby place can be outside one city but
inside another, so this deduplication is scoped to the specific location /
bbox instead of the whole run.
This prevents repeated outside-area rows from inflating strict-mode
subdivision signals while preserving valid results for neighboring
locations.
[1.6.43] - 2026-06-15
Strict search area efficiency
Fixed strict-area subdivision so nearby/out-of-area Google Maps results no
longer make the Actor split into more child viewports.
The Actor now treats a viewport as saturated only when there is enough
result supply inside the resolved search area, reducing wasted strict-mode
requests on multi-location runs where Google broadens results to nearby
cities.
Strict filtering behavior is unchanged: places outside the resolved search
area are still dropped when geoStrictMatch=true.
[1.6.42] - 2026-06-15
Review extraction speed
Shortened the per-place review wait for small review limits so a single slow
Google review/place endpoint cannot keep users waiting for up to 45 seconds
when they request only a few reviews per place.
Review extraction now abandons a bad residential review session immediately
after a hard timeout/proxy failure and retries the remaining places with a
fresh residential session.
The retry still preserves review coverage behavior: places that did not get
reviews in the bad session are retried before being skipped.
[1.6.41] - 2026-06-15
Review extraction reliability
Added a per-place timeout guard for Google review extraction so one slow or
broken residential proxy exit cannot hold an entire search task open for
many minutes.
When review requests hit repeated timeout/network errors, the Actor now
stops using that review session early and retries the missing places with a
fresh residential proxy session.
This keeps normal review coverage behavior, but reduces worst-case task time
when Google Maps place/review endpoints stall behind a bad proxy exit.
[1.6.38] - 2026-06-12
Website enrichment quality
Treated SmugMug-hosted pages as hosted platform pages and filtered platform
operational inboxes such as help@smugmug.com so they are not saved as the
local business email.
Made website-enrichment domain cache return independent contact copies so
per-place cleanup, such as removing duplicate phone numbers, cannot mutate
the cached result reused by another place on the same website.
[1.6.37] - 2026-06-12
Location quality and cost efficiency
Focused automatic viewport subdivision on the resolved search area. When a
city/suburb has an exact OpenStreetMap polygon or bbox, child viewports whose
centers are outside that area are no longer searched first. This reduces
nearby-city results and wasted website enrichment even when
geoStrictMatch=false.
This specifically helps area-adjacent searches where a rectangular map split
could previously put a child viewport into a nearby area and attract
outside-area results.
[1.6.36] - 2026-06-12
Input warnings
Made legacy placeMinimumStars warnings fire for any non-empty saved-task
or API value, even if the value is no longer part of the public input schema.
The value is still ignored so old post-fetch rating filters cannot remove
already-fetched places.
[1.6.35] - 2026-06-12
Cost efficiency
Removed public input controls for post-fetch category, exact-title, and
minimum-rating filters. These filters could discard places only after Google
Maps data had already been fetched, which made some runs expensive while
producing little or no saved output.
Existing saved tasks/API calls that still send categoryFilterWords,
searchMatching, or placeMinimumStars remain accepted, but those values are
ignored with a clear input warning. Filter the finished dataset by
categories, title, searchString, or totalScore instead.
skipClosedPlaces and geoStrictMatch remain available because they are
explicit quality controls users may still want for active-business and strict
area exports.
[1.6.34] - 2026-06-11
Diagnostics
Clarified the repeated reverse-geocoder warning text so it accurately says
Reverse geocoding rather than implying the helper is only used by address
backfill.
[1.6.33] - 2026-06-11
Reliability
Reduced noisy Failed to perform logs from optional reverse-address
backfill. When external reverse geocoders are unstable, the Actor now uses
quiet retries, limits reverse-geocode concurrency, and disables the failing
fallback for the rest of the run after repeated failures.
Added address-backfill counters to the search stats so these best-effort
address lookups are easier to distinguish from the main Google Maps scrape.
Google Maps places are still saved even when reverse-address backfill is
skipped.
[1.6.32] - 2026-06-11
Location quality
Fixed location-context detection so short state codes are matched as words,
not as substrings inside category terms. Queries such as HVAC companies
with Denver, CO and Med spas with Philadelphia, PA now keep their
location context instead of being sent to Google Maps as broad locationless
searches.
[1.6.31] - 2026-06-11
Location quality
Narrowed bare US state-abbreviation normalization so global queries such as
Delhi IN, Bogota CO, or Pilar AR are not forced into Indiana,
Colorado, or Arkansas. Comma-separated US shortcuts and known pasted US
city/state pairs such as Irvine CA are still normalized safely.
[1.6.30] - 2026-06-11
Location quality
Ignored obvious Location(s) table headers such as City State,
City, State, and City State Zip before geocoding, so copied CSV headers
no longer create unintended searches.
Expanded common US city/state shortcuts such as Irvine CA, Miami FL, and
Salt Lake City UT to City, State, United States before geocoding and
caching. This prevents Photon fallback from resolving the state abbreviation
as the wrong country/region while keeping the user's original search intent.
[1.6.29] - 2026-06-10
Location quality
Improved geoStrictMatch=false exports under per-search/result limits. When
both in-area and out-of-area Google results are available, the Actor now
gives verified in-area rows priority before known insideSearchArea=false
rows consume the remaining output budget.
This does not change strict-mode behavior and does not hide out-of-area rows
when there is enough room to save them. It only prevents nearby/out-of-area
Google results from crowding out better in-area matches when the configured
max results limit is reached.
[1.6.28] - 2026-06-10
Diagnostics
Centralized the first diagnostic rules in src/diagnostics.py so input,
geocoder, and strict-search quality messages are triggered by reusable,
tested rule conditions instead of one-off log text in the main pipeline.
Added concise input warnings when a search term appears to include the same
location that is already provided in Location(s), or when a city/country
suffix is typed into searchStringsArray while no Location(s) value is set.
The warning suggests the exact keyword/location split instead of printing a
generic help paragraph.
Added a geocoding warning when OpenStreetMap/Photon returns multiple same-name
area candidates and the Actor has to choose one. The log now names the chosen
area and asks users to add province/state/country only for that ambiguous case.
Reduced noisy Nominatim rate-limit retries. Location lookup now falls back to
Photon after one rate-limited Nominatim response, and reverse geocoding uses a
quiet single Nominatim attempt before Photon fallback.
[1.6.27] - 2026-06-10
Cost efficiency
Reduced wasted strict-area pagination in focused map viewports: when
geoStrictMatch=true and a focused/child viewport returns a full page of
places with no in-area matches, the Actor stops paginating that viewport
earlier instead of continuing through clearly out-of-area tail results.
Deduplicated contact-page URLs that differ only by a #fragment anchor
before website enrichment fetches them. This avoids fetching the same
contact page more than once while keeping the same extracted contacts.
Location accuracy and diagnostics
Accepted copied Google Maps search URLs that encode the map viewport as
@lat,lng,meters (for example @-11.3016789,-41.8679311,7094m) in addition
to the older @lat,lng,zoom format. These URLs now run as visible-map-area
searches instead of being rejected as unsupported.
When geoStrictMatch=true and the primary geocoder resolves a location only
to a point, the Actor now tries the Photon/OpenStreetMap fallback for an area
boundary before deciding the strict area cannot be trusted.
Added a clearer run diagnosis when strict matching succeeds but most Google
results are outside the resolved area, so users can tell when a Location is
too broad/ambiguous or Google has broadened the search after local results.
[1.6.26] - 2026-06-10
Billing and cost efficiency
Switched pricing-event detection to the official Apify
Actor.get_charging_manager().get_pricing_info() API. During the pricing
transition, the Actor now reads the active run pricing from Apify instead of
relying on internal SDK configuration details.
Added a budget-aware gate before expensive website enrichment and review
fetching. When maxTotalChargeUsd is already exhausted or only a few default
dataset rows can still be charged, the Actor trims the current batch before
contact/review work so it does not spend proxy/network time on rows that
cannot be saved and charged.
[1.6.25] - 2026-06-10
Location UI clarity
Moved geoStrictMatch (Keep only places inside the search area) directly
below Location(s) in the public input form. This makes it clear that strict
area filtering applies to the whole selected search area, not only to the
structured geolocation fields.
Clarified the geoStrictMatch help text: it works with Location(s),
structured geolocation fields, and customGeolocation; for multi-location
runs, each result is checked against the specific location area that produced
it.
[1.6.24] - 2026-06-10
Location input simplification
Replaced the two visible location inputs with one public locationQueries
field titled Location(s). Users now enter one location per line; a single
line is the normal one-location case, and multiple lines run the same search
terms across multiple areas.
Kept the older locationQuery field as a backward-compatible hidden
API/saved-task alias, so existing integrations keep working.
When locationQueries is provided, structured geolocation fields such as
countryCode, state, city, and postalCode are ignored with a clear
warning instead of being silently added as another location.
[1.6.23] - 2026-06-10
Multiple locations
Added locationQueries for running the same search terms across several
independent locations in one run. Each entry is geocoded separately, so users
should add one complete area per line instead of combining two cities in one
locationQuery string.
maxCrawledPlacesPerSearch now applies per search term per location for
normal keyword searches. Existing single-location runs keep the same behavior.
Multi-location runs keep one shared deduplication set, so the same Google
Maps place is still saved only once when it appears in overlapping areas.
If several target countries are detected, proxy country auto-selection is
left unset instead of forcing the whole run through the first country's proxy.
[1.6.21] - 2026-06-10
Lead filtering and billing fairness
Removed the public input fields websiteFilter and skipPlacesWithoutEmail.
The Actor now saves scraped places regardless of whether they have a website
or whether website enrichment found an email. This prevents runs from doing
paid Google Maps/search/enrichment work and then discarding rows before they
can be delivered.
Existing saved tasks or API calls that still send websiteFilter or
skipPlacesWithoutEmail=true remain accepted, but those values are ignored
with an inputWarning. Filter the finished dataset by website or emails
after the run when you need a narrower export.
[1.6.18] - 2026-06-05
Website contact enrichment quality
Email domain matching is less brittle for real small-business sites. The
Actor now keeps related-domain emails such as punctuation-free domain
variants (wash-box.es → info@washbox.es), same business names across
TLDs (alvato.com → info@alvato.es), localized webmail domains
(yahoo.com.ar), and strong company abbreviations
(lacasadelasherramientas.com.ar → info@lcdh.com.ar).
Added contactsFilterEmailsByWebsiteDomain input option. It is enabled by
default for cleaner leads. Users can turn it off to keep every public email
found on the website after the basic false-positive filters.
Clarified the input description for email filtering: enabled mode prioritizes
website-related contact emails, while disabled mode saves every valid-looking
public email found on the visited website/contact pages, including
parent-company, agency, vendor, or platform domains.
OUTPUT.searchStats and run diagnostics now include website-enrichment
counters: website candidates, rows with email/social/extra phones, cache
hits, unique website fetches, proxy fallback recoveries, and raw emails
ignored as unrelated to the website.
Verified against the recent no-email audit sample: the local enrichment
recovered 14 additional email-bearing websites that were previously missed
or filtered too strictly.
[1.6.17] - 2026-06-04
Strict search area rollback
Rolled back the 1.6.16 strict-mode behavior that dropped Google Maps
results only because the search response did not include coordinates. This
was too strict for valid service-area/local results and could turn a real
query into a zero-row run.
geoStrictMatch=true still drops places with coordinates outside the
resolved search area. Rows without coordinates are kept instead of being
discarded as unknownAreaPlacesDropped; insideSearchArea is added only
when coordinates are available.
[1.6.16] - 2026-06-04
Website contact enrichment
Homepage email extraction now decodes additional static in-source hiding
patterns such as String.fromCharCode(...), atob(...), Base64
data-email attributes, data-user + data-domain attributes, simple JS
string concatenation, and explicit JS reversed email snippets without
browser rendering.
When the homepage has no email, contact enrichment can now check up to five
high-confidence contact/about/location pages in parallel instead of only
one. This includes localized branch/location pages such as /sucursales,
press/media pages such as /prensa, venue booking/function pages, and
Nordic customer-service pages such as /kundservice.
If a deep business URL fails to open, enrichment now tries the same site's
root and first path-prefix page before giving up. This recovers contacts
from Google Maps website links that point at stale branch/detail pages while
the main site still works.
Contact-page selection now skips binary/static files such as PDFs and ranks
press/media pages above generic event pages, reducing wasted contact-page
slots.
Reversed email artefacts, placeholder addresses such as tunombre@email.com,
third-party vendor emails, and repeated directory/platform operational
addresses are filtered more carefully, so enrichment does not inflate counts
with contacts that do not belong to the business.
Business-website enrichment now has a targeted insecure-TLS fallback for
small sites with hostname-mismatched HTTPS certificates. This fallback is
used only for public business websites, not for Google Maps requests.
Shared platforms such as Facebook, Instagram, Google Sites, and directories
now use URL-scoped enrichment cache instead of host-scoped cache, preventing
contacts from one business page from being copied to another business on the
same platform.
contactsProxyFallback is now enabled by default. Direct website fetching
still happens first; residential proxy is used only to retry failed
business-website/contact-page requests. Users can turn it off to minimize
proxy bandwidth when coverage is less important.
Direct placeIds and exact Google Maps place URLs now run the same website
contact enrichment path as normal search results before the row is pushed.
This fixes exact-place runs that previously returned the Google Maps place
data without emails, additionalPhones, or social links.
Strict search area diagnostics
Note: the strict-mode no-coordinate drop introduced in this build was rolled
back in 1.6.17 because Google can omit coordinates from search cards for
valid local/service-area results.
Strict searches now stop paginating a viewport early when a later Google
results page contains parsed places but none inside the strict search area.
This reduces time and proxy usage on searches where Google starts returning
nearby-city tail results after local results are exhausted.
[1.6.15] - 2026-06-03
Search URL limits
maxCrawledPlacesPerSearch now applies independently to each accepted
Google Maps search URL from startUrls, even when multiple copied URLs have
the same query text but different map viewports.
Normal keyword searches still use the existing per-search-term limit.
[1.6.14] - 2026-06-03
Documentation and logging
Updated the public input help, README, and changelog to describe the new
Google Maps search URL mode and its requirement for @lat,lng,zoom.
Cleaned the new search-URL acceptance log line so it renders as plain ASCII
in all consoles.
[1.6.13] - 2026-06-03
Search URL viewport quality
Google Maps search URLs now use a viewport-sized bbox for
insideSearchArea / geoStrictMatch decisions. This better matches the
visible map area copied from Google Maps and avoids treating normal nearby
results as outside the search URL area.
[1.6.12] - 2026-06-03
Google Maps search URLs
startUrls now accepts copied Google Maps search result URLs that include
both free-text search content and an explicit @lat,lng,zoom viewport, for
example /maps/search/coffee+shop/@30.2711286,-97.7436995,13z.
Search URLs are handled as search jobs in the copied map viewport. Exact
place URLs and query_place_id URLs still use direct place mode, so the
Actor does not guess one place from a search page.
OUTPUT now reports searchStartUrlsRequested,
searchStartUrlsAccepted, and unsupportedStartUrls.
[1.6.11] - 2026-06-03
Spending-limit graceful stop
Pay-per-event runs now inspect Apify SDK ChargeResult responses from
dataset writes and add-on charges. When the run's maxTotalChargeUsd limit
is reached, the Actor stops accepting new work, skips the remaining queued
tasks, writes OUTPUT, and reports status: PARTIAL instead of continuing
until the platform aborts the run.
OUTPUT now includes stoppedByChargeLimit and chargeLimitEvent so users
can distinguish a spending-cap stop from Google/proxy failures or input
problems.
[1.6.10] - 2026-06-03
Billing notice-period fallback
Until the upcoming Pricing-tab events become active on 2026-06-17, contact
enrichment charges the currently active legacy email/social events instead
of probing the future place-with-contact-details event.
After the notice period, the Actor switches to the new
place-with-contact-details and review-scraped events automatically.
[1.6.9] - 2026-06-03
Billing transition safety
Manual add-on charging no longer depends on an SDK pricing-info guard that
may be unavailable at runtime.
During the pricing-change notice period, contact enrichment falls back to the
legacy place-with-emails / place-with-socials events if the upcoming
place-with-contact-details event is not active yet.
[1.6.8] - 2026-06-03
Pay-per-event billing
Website email and social-profile add-ons now use one combined
place-with-contact-details event instead of separate place-with-emails
and place-with-socials events.
Review extraction now has its own review-scraped event, charged by the
number of reviews actually attached to dataset rows.
The base apify-default-dataset-item event remains the primary per-place
charge and is still auto-charged by Apify when a place is pushed.
[1.6.7] - 2026-06-02
Proxy retry latency
Google Maps sticky search sessions now fail fast on long curl timeout errors
and consent/captcha blocks, allowing the pipeline to switch to a fresh proxy
session instead of spending up to four full request timeouts on the same bad
residential exit IP.
[1.6.6] - 2026-06-02
Strict suburb boundaries
Small geocoded areas such as suburbs and neighborhoods now use the exact
OpenStreetMap boundary polygon when Nominatim provides one. This makes
geoStrictMatch=true stricter than bbox-only filtering for places like
Seddon, where nearby Footscray / Yarraville coordinates can sit inside the
suburb bbox but outside the real suburb boundary.
Broad regions still use bbox filtering to avoid pulling huge country/state
polygons into high-volume runs.
[1.6.5] - 2026-06-02
Strict geometry
customGeolocation Polygon / MultiPolygon inputs now use exact
point-in-polygon checks for final insideSearchArea / geoStrictMatch
decisions. Their bbox is still used to frame Google Maps searches, but no
longer acts as the final custom-area boundary.
Strict bbox checks now use the resolved bbox without the previous adaptive
tolerance. This prevents small suburbs such as Seddon from accepting nearby
Yarraville / Footscray / Essendon results just because they are close to the
bbox edge.
[1.6.2] - 2026-06-02
📍 Strict area safety
geoStrictMatch=true now refuses to run search terms when the provided
location resolves only to a point and has no area boundary / bbox. This
prevents users from being charged for rows that cannot be checked against a
strict search area.
Point-only geocodes still receive location context in the Google Maps search
query, improving non-strict searches for ambiguous locations.
[1.6.0] - 2026-06-02
📍 Location targeting
Search queries now include the resolved location context for normal
location-based searches, not only for very broad regions. For example,
dentist in Austin, Texas, United States is sent to Google Maps as a
location-aware query instead of relying on the map viewport alone.
This reduces nearby / out-of-area Google Maps results before bbox filtering
has to mark or drop them. insideSearchArea and geoStrictMatch remain the
final quality guardrails for strict city, neighborhood, and polygon exports.
[1.5.65] - 2026-06-01
📍 Geographic filtering
Added a visible geoStrictMatch input switch. It is off by default, so the
Actor keeps all Google Maps results returned for the query. When enabled, it
drops places whose coordinates are outside the resolved location or custom
GeoJSON bounding box.
Dataset rows now include insideSearchArea when a bbox and place coordinates
are available. OUTPUT.searchStats also separates outside-area candidates
returned by Google from outside-area rows dropped by strict mode.
[1.5.64] - 2026-05-31
Search stability refinement
Stopped recursive subdivision for child viewports that only contain places
removed by user filters. Filtered-only pages can still trigger one bounded
seed-level probe, but deeper children now need real accepted/no-email supply
before they split again. This prevents withoutWebsite searches with reviews
from spending the timeout on kept 0 branches.
[1.5.63] - 2026-05-31
Diagnostics
Empty filtered runs now explain the full drop breakdown in OUTPUT,
including both out-of-area removals and user filters. For example, a run can
now say that 170 candidates were outside the resolved area and 30 were
removed by websiteFilter=withoutWebsite, instead of mentioning only the
filter count.
[1.5.62] - 2026-05-31
Search stability refinement
Refined the 1.5.61 subdivision guard to allow one bounded seed-level
probe when a broad viewport is mostly out-of-area but still shows a few
local filtered candidates. This keeps coverage for strict filtered searches
without letting child viewports recurse unless they show stronger local
supply.
Search tasks are now de-duplicated when enqueued, so pre-subdivided seed
children are not scheduled again if the parent viewport also becomes
saturated.
[1.5.61] - 2026-05-31
Search stability
Tightened viewport subdivision so mostly out-of-area / filter-rejected
result pages no longer spawn large child queues. This prevents strict
filtered searches from running until the platform timeout.
Added a soft timeout guard: near the Apify hard timeout the Actor stops
taking new search work, saves partial results, and writes OUTPUT with a
PARTIAL status instead of being killed without a run summary.
[1.5.60] - 2026-05-30
Input guidance
The Apify input form now explains how to get exact Google Maps URLs:
open the place in Google Maps, click Share, then Copy link.
placeIds help text now clearly separates raw Google Place IDs from
/g/... short IDs, fids, CIDs, and full URLs, and tells users to paste
those identifiers as full Google Maps URLs instead.
[1.5.59] - 2026-05-30
Documentation and release history
Changelog headings now use Apify build-style patch revisions instead of
mixed 1.5 / date labels.
The current release is marked as shipped; previously shipped fixes are
grouped under the build where they became available.
[1.5.58] - 2026-05-30
Exact Place ID resolution
query_place_id and q=place_id:... URLs that do not include a search
query are normalized to query=place&query_place_id=... before fetching.
This matches the route Google Maps itself uses to resolve an exact place.
Direct /maps/place/... pages can now replay Google's embedded
/maps/preview/place XHR when the SSR page no longer carries a parseable
place payload. This fixes exact place URLs that previously ended with
No place payload found in state.
[1.5.57] - 2026-05-30
Input correctness and exact direct targets
Invalid placeIds are now rejected before a network request is made. The
field accepts raw Google Place IDs such as ChIJ...; short /g/... IDs,
fids, CIDs, and arbitrary strings are ignored with a clear warning.
Direct Google Maps URLs now also recognize exact query_place_id,
q=place_id:..., cid=..., and maps.app.goo.gl targets. Free-text Maps
search URLs are still ignored so direct mode never guesses a place.
Raw Place ID inputs now use Google's embedded /maps/preview/place endpoint
from the exact query_place_id Maps page, then accept the result only when
the returned placeId matches the input.
startUrls.userData objects are preserved in dataset rows as
inputUserData, allowing API integrations to map results back to their own
records.
Unsupported startUrls now produce a warning instead of silently
disappearing from the direct-target list.
Stability and diagnostics
Google scraping now fails fast when Apify Proxy initialization fails, instead
of falling back to direct worker IP requests.
Deduplication now reserves seen_place_ids only after bbox and user filters
pass. Out-of-area or filtered candidates no longer block the same place from
being accepted later from a valid viewport.
Per-term limit and email-only drops now release unpushed reservations, so
places that never reached the dataset do not poison future dedupe.
categoryFilterWords now matches whole category words or phrases; for
example bar matches Bar & grill but not Barber shop.
searchStats now includes contact and review counters, making best-effort
website enrichment and review extraction visible in the final summary.
Conditional PPE charge failures are logged once per event instead of being
hidden at debug level.
OUTPUT / state KV schemas now document the direct-target counters and
resumability fields the Actor actually writes.
[1.5.52] - 2026-05-30
Broad-area search quality
Broad state/country-style searches now include the resolved location text in
the Google Maps query while keeping the original searchString in output.
This prevents sparse regions such as Alaska, USA from returning only
out-of-area fallback results at very low zoom.
Nederland / Netherlands geocoding now uses the European mainland bbox
instead of Nominatim's full Kingdom of the Netherlands bbox, which includes
Caribbean islands and previously caused huge, slow country-scale subdivision.
Run-summary accuracy
Proxy session IDs now match Apify Proxy's exact allowed pattern
(^[\w._~]+$). Negative longitudes previously left a - in the session
id, causing fresh U.S. viewport tasks to fail before making a Google request.
Resurrected runs now keep a truthful final summary when all tasks were
already completed before resurrection: OUTPUT.status stays successful,
totalPlaces is reconciled with dataset item count, and duration is measured
from the original run start instead of the short resurrection segment.
Review extraction now uses the same residential proxy session for all review
limits, including small maxReviewsPerPlace values. Datacenter review
requests were cheaper but unreliable for some locations/locales.
Review extraction now retries places that received no review rows once with
a fresh residential session, because Google's review feed can be empty for a
single proxy exit even when another exit returns the reviews correctly.
Direct Google Maps URL / Place ID targets now count as resolved only when a
place was actually parsed and pushed, so failed direct targets no longer make
a mixed run look healthier than it is.
Direct Google Maps place URLs now use an exact fallback only when the input
URL or Place ID contains a stable Google identifier (placeId, fid, or
cid). Ambiguous title/coordinate-only URLs are not guessed; unresolved
targets are reported as failed direct targets.
Resurrected summaries now reconcile totalPlaces in both directions against
the real dataset item count, and uniquePlaceIds counts pushed places rather
than raw candidates seen before filters.
Unexpected worker or pipeline failures now surface in searchStats /
OUTPUT.status as PARTIAL or FAILED instead of being hidden behind a
platform-level SUCCEEDED run.
Failed search viewport tasks no longer produce a misleading "Google returned
0 raw places" diagnostic; the summary now points at failed coverage instead.
[1.5.29] - 2026-05-29
Input robustness and empty-run diagnostics
Search viewport tasks now try several fresh proxy sessions before giving up
on proxy-level failures such as CONNECT tunnel failed, response 407. This
reduces partial coverage on runs where a proxy country has a few bad exits.
Proxy session IDs are now forced to ASCII before being sent to Apify Proxy.
Non-English search terms such as German umlauts could previously leak into
the session ID and trigger 407 Proxy Authentication errors.
Runs with failed search viewport tasks now report PARTIAL instead of plain
SUCCEEDED, making incomplete coverage visible in the run summary.
Commas in business-name-plus-address search terms such as
Restaurant, 7050 Arosa no longer trigger the misleading
"comma-separated categories" warning.
API inputs are now normalized defensively: string values for list fields
such as searchStringsArray, placeIds, startUrls,
additionalLanguages, and categoryFilterWords are converted into usable
lists instead of being interpreted character by character.
String booleans such as "false" and "true" are now parsed correctly, so
integrations that send form values as strings do not accidentally enable
expensive options.
Numeric knobs now default or clamp safely when malformed values are passed,
preventing bad JSON/API inputs from turning into platform-level run errors.
Runs that complete with zero pushed places now write a clearer
INPUT_NEEDS_ATTENTION summary explaining that no places were found and
suggesting broader terms, location checks, or looser filters.
Run summaries and Apify logs now include concrete diagnostic hints, for
example whether Google returned no raw places, results were outside the
resolved area, active filters removed everything, or email-only mode removed
places without discovered emails.
[1.5.24] - 2026-05-25
Lead generation limit accuracy
Email-only lead runs (skipPlacesWithoutEmail=true) now apply the
per-search result limit after contact enrichment has confirmed an email.
This prevents runs from finishing below maxCrawledPlacesPerSearch because
no-email candidates temporarily occupied limit slots.
If skipPlacesWithoutEmail=true is used while website contact enrichment is
off, the Actor now reports a clear inputWarning and ignores the email-only
filter instead of silently returning places without emails.
[1.5.22] - 2026-05-24
Search reliability and input guardrails
maxCrawledPlacesPerSearch is now enforced globally across concurrent
viewport tasks. Dense/subdivided runs no longer push more rows than the
user-requested per-search limit, including extra language passes.
Runs with missing targets now write status: INPUT_NEEDS_ATTENTION and a
clear inputWarning in the OUTPUT key instead of looking like a clean empty
scrape.
Search terms without a Location now get a specific warning explaining that
the city or region must be entered in locationQuery, geolocation fields,
or customGeolocation.
Geo-strict runs now expose search stats in OUTPUT, including out-of-area,
filtered, no-email, and limit-dropped counts. Strong seed-level geographic
mismatches turn the run summary into PARTIAL or INPUT_NEEDS_ATTENTION.
Automatic zoom selection is more robust for elongated OpenStreetMap
administrative bboxes, such as Brazilian coastal states/cities with offshore
islands in their bbox. This keeps Google Maps searches focused while
preserving the original bbox as the geo-strict guardrail.
customGeolocation Point and Polygon inputs now carry a bbox into the
search pipeline, so geo-strict filtering also protects custom areas.
[1.5.20] - 2026-05-23
Reviews - up to 1000 per place
maxReviewsPerPlace now supports up to 1000 reviews per place.
Reviews are fetched through Google's paginated review feed instead of the
old inline preview sample, so users can get hundreds of real review rows
when Google exposes them for the place.
Each review keeps the correct author/text/rating pairing, including
reviewId, author metadata, relative publish time, text, rating when
exposed, and attached photos when available.
Large review limits use more Google/proxy requests and can take longer;
set maxReviewsPerPlace to 0 for the fastest normal place scraping.
[1.5.10] - 2026-05-17
Five targeted fixes for silent data-quality and runtime issues surfaced
by production scrapes across the U.S., Indonesia, and South-East Asia.
All fixes are covered by a new 103-case regression suite that imports
the real src/ modules (not mocks).
Email extractor — false-positive purge
Bare-word at deobfuscation no longer creates fake emails.
Phrases like "shop online at www.aaa.com ", "available only at
savagex.com", or "Order at amazon.com" used to be rewritten as
online@www.aaa.com / only@savagex.com / Order@amazon.com and
scraped as real contacts. The deobfuscator now only triggers when the
local-part has email-like signals (internal ./-/_/+, a digit,
or a known role name like info/sales/team) AND isn't an English
filler / imperative verb (order, buy, visit, us, goods, …).
Non-business email domains rejected — registrar / CMS / CDN /
font-attribution domains that masquerade as contact addresses on
freshly-launched or platform-hosted sites: godaddy.com,
wordpress.com, wix.com, squarespace.com, hostinger.com,
latofonts.com, fontawesome.com, fonts.gstatic.com,
cloudinary.com, etc. Catches the filler@godaddy.com /
team@latofonts.com / blog@wordpress.com family of artefacts.
@www.<host> domains rejected — emails published at the literal
www. subdomain are virtually always Linktree-style mis-parses;
real mailboxes live at the apex.
Placeholder locals rejected — filler@, placeholder@,
noreply@, no-reply@, donotreply@ are now dropped regardless of
domain (they're never useful for outreach).
Default bbox tolerance dropped from a fixed 0.5° (~55 km) — which
made geo-strict mode almost useless for city-sized bboxes — to an
adaptive rule: 10 % of the bbox span on each axis, clamped to
[0.005°, 0.05°] (~0.5–5 km). With the old default, a Waterbury-CT
query routinely leaked Hartford (30 km) and beyond; with the new one,
a tight West-Java bbox correctly rejects all 64 Singapore and 12
Malaysia false-positives from the previous run.
Pipeline saturation heuristic — no more cascades into empty space
Subdivision now requires at least 2 places that passed the bbox
check (kept + filtered + no-email-dropped) before splitting. The
prior heuristic used raw "unique placeIds returned by Google", which
caused the pipeline to recursively explode 80→320→1280 viewports in
thin-niche searches where every result was Google's regional fallback
(out-of-bbox). Real-world impact: a single "Cabinet Refinishing in
Evansville, IN" run dropped from 422 seconds @ 7.7 places/min to
the expected ≈60-second / 50+ places-per-minute range.
Concurrency — pre-subdivide seeds when under-utilised
The pipeline pool now seeds 5 viewport tasks instead of 1 when the
natural terms × seeds × langs cross-product is smaller than the
worker concurrency (typical: one search term + one seed location). All
workers start busy from t=0 instead of seven idling while the lone
seed serially fetches its 80 SSR results. Single-term runs that
previously throttled at ~25 places/min should now run at ~70–90.
HTTP — fast-fail on proxy-anchored errors in one-shot get()
BoringSSL BAD_DECRYPT / CONNECT tunnel failed errors are
anchored to a single bad upstream IP; burning 4 retries against the
same sticky session can't recover them. The sticky session.get
path already short-circuited these; the one-shot GMapsHTTP.get
now does the same. Saves ~20–60 s on degraded Apify residential
pool windows.
Tests
New regression suite under scripts/test_fixes.py (103 unit cases
importing the real src/ modules) and scripts/test_against_real_dataset.py
(end-to-end against a captured West-Java run with 552 places).
[1.4.0] - 2026-05-13
Per-place reviews + a sweep of data-quality fixes.
The Actor now extracts each place's top reviews via Google's
/maps/preview/place SSR XHR — the same call its own JavaScript fires
on first paint of a place card. Per review you get reviewId, text,
originalLanguage, relative + ISO-8601 publish dates, reviewer name +
avatar + profile URL + numeric ID, total review/photo counts, Local-
Guide badge, and attached photo URLs. ~5–10 reviews per place — that's
all Google ships inline; the rest are behind a browser-tokened endpoint
that requires JavaScript-synthesised session tokens we can't reach
HTTP-only.
Per-review star rating is also browser-tokened and not exposed —
only the aggregate totalScore for the place is available. The actor
hedges with two URL variants (original + viewport-rewritten) plus one
retry each to absorb Google's random "lite-response" returns, getting
to roughly 80 % per-run coverage. Disabled by default (set
maxReviewsPerPlace > 0 to enable). Requires Apify residential proxy.
Data-quality fixes
additionalInfo deduplication. Google occasionally emitted both
an enum-id and a free-text entry rendering to the same label inside a
section (e.g. Payments listed Credit cards twice). The parser now
dedupes per section; ~27 % of places in dense urban runs were
affected.
Compound-city addresses split. Place tuples in Turkey and some
other locales put organised-zone names in the city slot using a
slash separator (e.g. "Büsan OSB/Karatay"). We now keep the trailing
municipality as city and roll the rest into neighborhood.
Phone numbers normalised to E.164. A few records came back with
local-trunk format (e.g. "(0332) 221 52 52" instead of
"+90 332 221 52 52"
). Phones are now promoted to E.164 for every country with
a known prefix.
Postal-code sanity flag. Records whose postal code's province
prefix doesn't match the parsed Turkish state get
postalCodeSuspect: true (Google's own data has the wrong digit
here — about 0.5 % of places).
Reverse-geocode fallback. Places Google returns with valid coords
but no address string at all (about 0.5 % of dense urban results) now
get backfilled from OpenStreetMap (Nominatim → Photon). Adds the
addressSource: "reverse_geocode" marker so consumers can tell.
Opt-out via reverseGeocodeMissingAddress=false.
Reliability fixes
Short-circuit retries on dead proxy IPs. Sticky-session GETs that
hit BoringSSL bad_decrypt / WRONG_VERSION_NUMBER /
CONNECT tunnel failed(595)
no longer burn all 4 retry slots on a stuck IP — they
surface fast and the pipeline mints a fresh session_id.
Pipeline-level proxy failover. A RequestsError propagated out of
a viewport task now triggers a single retry with a brand-new
session_id, instead of silently dropping the viewport. Previously,
one stuck IP could lose up to ~80 places per failed task.
safe() no longer indexes into strings. A latent parser bug —
when Google's protobuf drifted and a str landed where a list was
expected, safe(x, i) would return a single character (turning a
reviewer name into the letter "p", for example). The helper now
treats strings as leaves.
[1.3.0] - 2026-05-05
Drop residential-proxy traffic for non-Google requests
Up to v1.2 every HTTP — including Nominatim/Photon geocoding and arbitrary
business-website fetches during enrichment — went through Apify residential
proxy ($0.0008/MB). Most of those calls don't need it: free public APIs
don't anti-bot, and small-business websites rarely block datacenter IPs.
Fix: the actor now creates two HTTP clients:
http — residential proxy (user-configured), used for Google search XHR
SSR fetches that genuinely need real-IP routing.
http_direct — no proxy, direct from the Apify worker. Used for
Nominatim/Photon geocoding and the website-enrichment fetches.
Net effect on a typical run with extractContactsFromWebsite=true:
60-90% of HTTP requests no longer use residential proxy. Estimated savings:
**$0.10-0.15 per 1 000 places in op cost**.
Edge case: if a business website blocks the Apify worker's datacenter IP
(rare), the enrichment for that one site silently skips (we already
max_retries=1 for fail-fast website fetches). Other places are unaffected.
[1.2.0] - 2026-05-05
Lower memory floor (128 MB) — cheaper runs
Memory footprint measured with psutil:
Light run (5 terms × subdivision × 415 places, no enrichment): peak 80 MB
Medium run (50 places + full website enrichment, concurrency=8): peak 109 MB
Both well under 128 MB. The previous minMemoryMbytes: 256 was unnecessarily
high — frugal users couldn't pick the cheapest tier. Updated:
minMemoryMbytes: 128 (was 256) — opt-in for small runs
maxMemoryMbytes: 4096 (unchanged) — for city-scale jobs
At 128 MB on Apify:
Compute cost ~50% lower vs 256 MB
Same throughput for ≤ 100-place runs
For city-scale (1000+ places) prefer 512 MB to stay safe
[1.1.0] - 2026-05-04
Critical fix: strict geographic match (drop places from wrong country)
A real production run searching karnataka, India for school /
high school / pre university etc. returned 120 places of which only 3
were actually in Karnataka. The rest:
80 places from Texas, USA (Arlington, Fort Worth, Dallas)
20 places from Cantabria, Spain
11 from Andhra Pradesh (neighbouring Indian state)
1 from South Korea, 1 from Cambodia, 4 from Tamil Nadu / Maharashtra
Two compounding bugs:
Subdivision math broke at low zoom. The previous formula gave
children a longitude offset of 360 / 2^z * 0.75 which at z=6 is 4.2°
— almost the full width of a typical state. Children's centers drifted
into the Arabian Sea and Bay of Bengal.
No bounding-box check. When Google's search XHR found nothing at the
drifted coordinates, it fell back to the residential-proxy IP's country
for results. Apify residential exits in Texas / Spain / Korea returned
schools in those regions instead of empty results.
Fixes:
Correct subdivision math: child longitude offset is now
360 / 2^(z+2) (= a clean quarter of the parent viewport), with
latitude scaled by cos(lat) for high-latitude correctness.
bbox capture + filter: Nominatim and Photon both expose the queried
region's bounding box. We now store it in Viewport.bbox, propagate it
to all subdivided children, and drop any place whose (lat, lng)
falls outside (with 0.5° tolerance for border cases).
New inputgeoStrictMatch: true (default ON). Set to false to
keep the v1.0 wider-area behavior.
Verified on the same karnataka, India query → only Karnataka places
returned, Texas/Spain/Korea results dropped.
[1.0.0] - 2026-05-03
Production launch. First stable, public-ready release of the HTTP-only
Google Maps scraper. No browser, no Chromium — just curl_cffi with Chrome
TLS impersonation.
Coverage
Quad-tree viewport subdivision. When a viewport saturates (≥18 of the
first 20 results are new), it splits into 4 child viewports at zoom+1 and
recurses up to maxSubdivisionDepth (default 4 → up to 256 viewports per
seed). This is what lets the Actor break Google's hard ~120-results-per-area
limit and scrape entire metro areas.
Multi-zoom expansion (multiZoomDelta) — search each seed at
zoom-N..zoom+N for +30-70% extra unique places.
Multi-language passes (additionalLanguages) — re-search the same
area in additional hl= codes to catch translations and regional categories.
Geo composite resolver — countryCode / state / county / city /
postalCode joined into a single Nominatim query when locationQuery is
empty.
Direct inputs — startUrls (/maps/place/... URLs) and placeIds
(raw ChIJ… IDs) bypass search entirely.
Output (~46 fields per place)
Place identifiers (placeId, fid, cid, kgmid), structured address
(addressParts.{street, city, state, postalCode, neighborhood, countryCode}),
center + entrance coordinates, contacts (phone, phoneUnformatted,
website, websiteDisplay), ratings (totalScore, reviewsCount for
hotels), opening status (openingHoursToday, currentStatus,
nextOpensAt, permanentlyClosed, temporarilyClosed), descriptions
(subTitle, description, longDescription), categories, owner info
(ownerName, ownerId, claimThisBusinessUrl), placeTags (LGBTQ+
friendly, women-owned, …), full additionalInfo amenities tree, imagesCount
thumbnail, menu URL, plusCode, locatedIn, isAdvertisement, hotel
block (hotelStars, hotelPrice, hotelCheckInDate/hotelCheckOutDate,
hotelAmenities), plus run metadata (scrapedAt, language, rank,
searchPageUrl).
Built-in filters (post-fetch, free)
placeMinimumStars — two / twoAndHalf / … / fourAndHalf.
skipClosedPlaces — drop permanently / temporarily closed.
searchMatching — all / only_includes / only_exact (title vs term).
categoryFilterWords — keep only matching categories.
Optional add-on: website-contacts enrichment
When extractContactsFromWebsite is enabled, the Actor visits each place's
website and extracts emails (with deobfuscation of foo (at) bar (dot) com
style writing), additionalPhones (from tel: links, normalized to E.164,
deduped against the main phone), and 8 social-media URL fields (facebooks,
instagrams, linkedIns, twitters, youtubes, tiktoks, pinterests,
whatsapps). Domain-level cache means chain stores share one fetch.
Optional /contact page fallback when the homepage yields no email. Big
global chains (McDonald's, Starbucks, Hilton, …) are skipped by default.
Quality filters tuned against real-world false positives:
Reject CMS-glued phone numbers (e.g. 60957293003 — 11 digits without +
and not starting with NANP 1 is junk).
Reject Facebook XML namespace URL (/2008/fbml) and bare profile.php
placeholders; keep only profile.php?id=NNN and vanity URLs.
Reject Pinterest conversion-tracking pixel (ct.pinterest.com/v3) and
any social handle matching API-version pattern (v1, v2, …).
Reject .php / .html / .aspx "vanity URLs" — real social handles
never carry file extensions.
Performance: enrichment runs in parallel within one task via
asyncio.gather. Fetches use max_retries=1 (fail-fast) since retrying a
403/timeout from a third-party site rarely helps — better to skip and move
on. Real platform measurement: 20 places + full enrichment in ~21 s on
residential proxy.
Reliability & speed
Sticky residential proxy sessions per viewport — all paginated XHRs of
one logical search hit the same Apify residential exit IP.
AsyncSession reuse across the pagination chain — single TLS handshake
per task, HTTP/2 multiplexed.
Chrome TLS impersonation rotated per session (Chrome 120/123/124/131
profiles).
EU consent flow bypassed via pre-set CONSENT=YES+cb and SOCS=…
cookies on every Google request — no more consent.google.com redirects.
BlockedError retry-with-fresh-IP — when a sticky session does get
challenged, the pipeline mints a brand-new session_id (different proxy
exit) and tries once more before giving up.
Bulletproof geocoding — Nominatim with 6 s cap, falling back to Photon
(komoot) which uses the same OpenStreetMap data through more reliable
infra. Cached in KV store under _geocode_cache so repeat runs are instant.
Consent / captcha / 429 detection with intelligent backoff;
fast-fail on deterministic 4xx (no retry storms).
Resumable across Apify migrations — state checkpointed every 30 s and
on PERSIST_STATE event.
Bounded concurrency worker pool with dynamic enqueue of subdivided
child viewports.
Pay-per-event monetization (3 events)
Event
When it fires
Suggested price
apify-default-dataset-item
every place pushed (Apify auto-charges)
$0.0010 ($1.00 / 1 000 results)
place-with-contact-details
website enrichment yielded ≥ 1 email or social URL
$0.0015 ($1.50 / 1 000 places)
review-scraped
one review was attached to a dataset row
$0.00025 ($0.25 / 1 000 reviews)
src/billing.py detects the active PPE event set through Apify's official
Actor.get_charging_manager().get_pricing_info() API so transition-period
runs use the correct current pricing events.
Apify Store metadata
Input schema with grouped sections + select-list filters.