All notable changes to this Actor are documented here. Versions follow the
MAJOR.MINOR scheme used by Apify builds.
Bug fix
- Removed
?clean=true&format=... query params from output schema URL templates.
Apify Console parsed the templated URL, extracted clean=true as a string-typed
property, then failed dataset preview validation that expected clean to be a
boolean. Templates now point at the bare /items and /records/OUTPUT endpoints
and let the user pick format from the export menu.
Store metadata polish
- Memory hints in
actor.json: minMemoryMbytes=256, defaultMemoryMbytes=1024,
maxMemoryMbytes=4096.
CHANGELOG.md referenced from actor.json so it shows up on the Actor page.
- Explicit
dockerContextDir: "../" and readme: "../README.md".
Apify Store metadata
- Added Key-value store schema (
OUTPUT + GMAPS_SCRAPER_STATE collections).
- Added Actor Output schema with templated download links to dataset (JSON + CSV)
and the OUTPUT key-value record.
Run summary
- Run summary written to the
OUTPUT key on completion: totalPlaces,
uniquePlaceIds, searchTermsProcessed, languagesUsed, seedViewports,
completedTasks, directPlaceUrlsResolved, datasetId, datasetUrl,
startedAt, finishedAt, durationSeconds, filtersApplied.
Competitor-grade input filters + 13 new output fields
New input options:
categoryFilterWords — keep only places whose categories contain these words.
searchMatching — all / only_includes / only_exact title-vs-search match.
placeMinimumStars — drop places below a star threshold (two…fourAndHalf).
websiteFilter — allPlaces / withWebsite / withoutWebsite.
skipClosedPlaces — drop permanently/temporarily closed places.
placeIds — accept a list of Place IDs as direct scrape targets.
countryCode / state / county / city / postalCode —
composite geo fields (used when locationQuery is empty).
New output fields:
scrapedAt — UTC ISO-8601 timestamp.
language — echo of hl= used.
rank — 1-based position in the viewport's results.
searchPageUrl — the SSR URL that seeded this viewport.
permanentlyClosed, temporarilyClosed — derived from Google's status string.
isAdvertisement — true for sponsored results.
price — price level (e.g. $$).
menu — first menu / online-order URL (Slice, Seamless, …).
plusCode — Google Plus Code when available.
locatedIn — parent venue (e.g. mall) when the place is inside another.
inputPlaceId / inputStartUrl — source of the row when it came from
explicit input.
Reliability:
- Fast-fail on deterministic 4xx (no retry storms).
Output dataset schema
- 49 fields documented with type + description.
- 3 dataset views: Overview, Lead generation, Hotels.
Comprehensive parser rewrite — 30+ fields per place
Newly extracted fields: subTitle, description, longDescription, full
addressParts (street/city/state/postalCode/neighborhood), formattedLocality,
entranceLocation, full additionalInfo amenities tree, placeTags
(LGBTQ+ friendly, Latino-owned, women-owned…), hotel-specific block
(hotelStars, hotelPrice, hotelCheckInDate, hotelCheckOutDate,
hotelAmenities), ownerName, ownerId, currentStatus, nextOpensAt.
Performance:
- AsyncSession reuse across the pagination chain — single TLS handshake per task.
- Sticky proxy session per viewport — same residential IP for all paginated XHRs.
Quad-tree subdivision unlocks unlimited results per area
- Recursive viewport splitting when a viewport saturates (≥18 of 20 first-page
results new). Each level multiplies viewports by 4 (depth=4 → 256 viewports).
- Multi-language passes — re-search the same areas in additional
hl codes.
- Multi-zoom expansion — search each seed at zoom-N..zoom+N (
multiZoomDelta).
- State persistence — checkpoints every 30 s + on Apify's
PERSIST_STATE event.
Migrations resume without re-pushing duplicates.
- Worker pool with bounded concurrency.
- Sticky session ID per task.
- Consent-page / captcha / 429 detection with intelligent backoff.
Two-phase SSR → XHR flow
- Step 1: fetch SSR
/maps/search/{q}/@… to extract Google's freshly-minted
XHR URL.
- Step 2: fetch
/search?tbm=map&pb=… to receive the JSON results page.
- Pagination via
!8i{offset} parameter.
- Connection-reusing pipeline.
Initial release
- HTTP-only Google Maps scraper using
curl_cffi with Chrome TLS impersonation.
- Apify residential proxy support via
Actor.create_proxy_configuration.
- Quad-tree-ready architecture (single viewport in v0.1).
- Single search XHR → JSON parsing →
Actor.push_data.