Initial v0.1 — private actor; not yet pushed to Apify Store.
Two-mode Google Maps scraper:
- Shallow mode (
searchStringsArray): resolves free-form queries to candidate places, returning Place ID + summary fields per match.
- Deep mode (
placeIds[]): given Place IDs (or CIDs / FIDs / short URLs / full Maps URLs — all normalized), extracts reviews + Photos-tab images.
- PlaywrightCrawler scaffold with residential-proxy default, sessionPool with rotation, fingerprint randomization, and
CONSENT cookie pre-seeding for EU/UK egress IPs.
- Identifier normalization (
src/maps/identifiers.ts) for Place ID / CID / FID hex pair / short URL (goo.gl/maps, maps.app.goo.gl) / full /maps/place/... URL — all forms resolve to canonical Place ID before crawl. CID↔FID is pure arithmetic; short URLs resolve via redirect-following with a strict cross-origin allowlist (defends against SSRF-via-redirect leaking proxy session headers).
- Per-record
recordType discriminator ('place' | 'review' | 'place_photos' | 'run_summary') so callers route by type without re-deriving from shape.
- Generic review field set (when present):
rating, publishedAt, ownerResponse, detectedLanguage, reviewerProfileUrl, reviewerReviewCount, helpfulVotes, originalText, originalLanguage.
- Photo URL canonicalization: each photo emits
{ url, originalUrl } — url keeps Google's existing size suffix; originalUrl carries the =s0 "give me original size" form.
- Host allowlist for
reviewUrl and every reviewImageUrls[] entry: scheme must be https, host must end in google.com / googleusercontent.com / ggpht.com. Disallowed URLs are dropped with a per-record audit note; status downgrades to 'partial'.
- Per-record
status enum: 'ok' | 'place_not_found' | 'consent_wall' | 'rate_limited' | 'partial' | 'parse_error'.
- Run-summary record at end-of-run (and
log.info mirror) with aggregate counts and consecutiveBlocksAtEnd for detecting session-wide regressions.
- Playwright over HTTP for v0.1. Maps' shallow search and Photos-tab require JS-rendered DOM; the reviews path could go through
/maps/preview/listentitiesreviews HTTP, but maintaining a pb protobuf decoder is non-trivial. HTTP-reviews is a v0.2 perf optimization, not a v0.1 commitment.
maxReviews counts post-filter (after onlyWithPhotos). Caller's intuition is "give me 50 reviews"; the actor scrolls past more raw reviews when needed, bounded by maxReviewsScanned (default maxReviews × 10).
hl=en (and other language codes) sets Maps' UI locale only — does NOT filter reviews by language. All-language reviews are returned with detectedLanguage / originalLanguage annotations.
- Public Apify Store publish (legal + GDPR + abuse-resistance review pending; v1.0 conversation).
- DB writes / persistence (caller's responsibility).
- Routes, directions, Street View, popular-times, menus, posts, Q&A.
- Bulk geographic enumeration ("all restaurants in city X").
- Photo download (URLs only).
- Reviewer cross-place graph.