Stateful Google Maps Scraper
Pricing
Pay per event
Stateful Google Maps Scraper
High-performance Google Maps scraper with caching and incremental updates. Avoid re-scraping unchanged places, optimize speed automatically, and stream validated results — built for fast, cost-efficient recurring scraping.
Pricing
Pay per event
Rating
0.0
(0)
Developer

Hayder Al-Khalissi
Actor stats
0
Bookmarked
5
Total users
3
Monthly active users
16 days ago
Last modified
Categories
Share
Stateful Google Maps Scraper – Scrape Google Maps at Scale (OpenClaw & n8n Integration)
Scrape Google Maps for business listings, local SEO data, and place details. This Google Maps scraper runs on Apify and is built for recurring data collection, monitoring, and automation—so you get fresh, structured data without re-scraping everything every time.
Unlike one-shot scrapers, this actor remembers what it has already scraped. Run it daily or weekly: it only outputs new and changed places, so you save time, cost, and avoid duplicate work. Ideal for local business intelligence, competitor monitoring, lead generation, and Google Maps data extraction at scale.
Why Use This Google Maps Scraper?
- Stateful & efficient — Tracks seen places across runs; incremental mode outputs only new/updated results.
- Production-ready — Playwright, consent handling, proxy support, and adaptive rate limiting.
- Flexible input — Search by query + location, or scrape by Place IDs or place URLs.
- Rich output — Name, address, rating, reviews count, category, price, phone, website (with optional place-detail extraction).
- Fully documented options — Timeouts, limits, export format, state, and more (see Input options below).
When to Use This Actor
Use this Google Maps data scraper when you need to:
- Monitor business listings — Track new openings, closures, and rating changes over time.
- Build local SEO datasets — Extract names, addresses, categories, and contact data for many areas.
- Run scheduled scrapes — Daily or weekly runs that only process new or updated places.
- Scrape by Place ID or URL — Feed specific Google Maps place IDs or URLs for targeted extraction.
- Reduce cost — Incremental mode means you are only charged for new/updated places, not duplicates.
For a single one-off scrape, any crawler can work; for recurring Google Maps scraping and change detection, this actor is built for that.
Key Features
Stateful Scraping & Efficiency
| Feature | Description |
|---|---|
| Smart state | Stores seen places in Apify Key-Value Store; skip duplicates across runs. |
| Incremental mode | Output only new and updated places; never charged for skipped duplicates. |
| Change detection | Compare key fields (rating, phone, website, etc.) to detect updates. |
| Caching | Optional in-run cache to avoid re-processing unchanged data. |
| Streaming | Optional real-time streaming of results as they are collected. |
Scraping & Reliability
| Feature | Description |
|---|---|
| Playwright (Chromium) | Full JavaScript rendering for Google Maps search and place pages. |
| Consent handling | Automatically accepts consent dialogs (EN, DE, FR, ES, IT). |
| Proxy support | Use Apify Proxy to reduce blocks and consent redirects (useProxy: true). |
| Navigation timeout | Configurable navigationTimeout (seconds); increase if pages load slowly. |
| Page detection | Detects consent/captcha/blocked pages and logs clear warnings. |
| Rate limiting | Modes: adaptive, conservative, moderate, aggressive. |
Data You Get
- From search results: title, address, rating, reviews count, category, price, place URL, place ID.
- Optional place details: Set
includePlaceDetails: trueto visit each place page and get phone, website, coordinates, categories, thumbnail (slower, more complete). - Optional reviews: Set
includeReviews: trueandincludePlaceDetails: trueto get review text, rating, author, and date for each place (extracted from place detail pages). A reviews dataset view gives one row per review. Reviewer name and review images are extracted when present; see OUTPUT.md for known quirks (e.g. occasional missing name, truncated text). - Optional images/contacts: Flags
includeImages,includeContactsfor future enrichment.
How It Works
- Build search URLs from
searchQueries+location, or useplaceIds/placeUrlsfor direct scraping. - Open Google Maps in Playwright, handle consent, then scroll the results feed until no new places load or “end of list” is reached.
- Extract place data from result cards (and optionally from each place’s detail page if
includePlaceDetailsis on). - Validate, normalize, and push to the dataset; state is updated for incremental runs.
- Stop when
maxPlaces/maxResultsis reached.
Fresh mode (runMode: "fresh"): Full scan; output all places; update state.
Incremental mode (runMode: "incremental"): Skip already-seen places; output only new/updated; charge only for written items.
Run 1 (fresh): Scrape → store state → output allRun 2 (incremental): Scrape → compare to state → output new/updated only
📋 Input Options
Full schema: .actor/input_schema.json. Below is a complete reference of all options, grouped for clarity.
Search & Location
| Option | Type | Default | Description |
|---|---|---|---|
| searchQueries | array of strings | ["restaurant"] | Search terms (e.g. ["restaurant", "cafe"]). |
| location | string | — | Location string (e.g. "New York, NY"). |
| categories | array of strings | — | Google Maps categories to filter results. |
| customGeolocation | object | — | Custom area: Polygon, MultiPolygon, or Point with coordinates. |
| placeIds | array of strings | — | Google Maps Place IDs to scrape directly (no search). |
| placeUrls | array of strings | — | Full Google Maps place URLs to scrape directly. |
Limits
| Option | Type | Default | Description |
|---|---|---|---|
| maxPlaces | integer | 100 | Stop after this many valid places (1–100000). |
| maxResults | integer | 100 | Alias for maxPlaces; if both set, the lower value is used. |
| maxCrawledPlacesPerSearch | integer | — | Cap places crawled per search query (optional). |
Data & Enrichment
| Option | Type | Default | Description |
|---|---|---|---|
| includePlaceDetails | boolean | false | Visit each place’s detail page for phone, website, coordinates, categories. Required for review extraction when includeReviews is true. Slower, more complete. |
| includeReviews | boolean | false | Extract reviews for each place. Requires includePlaceDetails: true. Increases collection of personal data (review text, reviewer names, etc.). |
| includeImages | boolean | false | Extract images for each place. |
| includeContacts | boolean | false | Enrich with contact information (e.g. from website). Increases collection of personal data. |
| includeEnrichment | boolean | false | Enable additional data enrichment. |
| includeReviewerNames | boolean | true | If false, reviewer name is omitted or anonymized in output (reviews and reviews-flat views). Use for privacy-friendly runs. |
| minimalLogging | boolean | false | When true, info-level logs avoid place titles, full addresses, and full URLs (only counts and non-PII identifiers). Full details only in debug or with verbose logging. |
Performance & Reliability
| Option | Type | Default | Description |
|---|---|---|---|
| navigationTimeout | integer | 90 | Max seconds to wait for a Google Maps page to load (10–300). Increase if you see “Navigation timed out” errors. |
| useProxy | boolean | false | Use Apify Proxy for requests; recommended to reduce blocks and consent pages. |
| maxConcurrency | integer | 10 | Max concurrent requests (1–50). |
| rateLimit | string | "adaptive" | Rate limiting: adaptive, conservative, moderate, or aggressive. |
| enableCaching | boolean | true | Enable in-run caching to avoid re-scraping unchanged places. |
| streamResults | boolean | false | Stream results to the dataset in real time as they are collected. |
State & Incremental Mode
| Option | Type | Default | Description |
|---|---|---|---|
| runMode | string | "fresh" | "fresh" = full scan, output all; "incremental" = output only new/updated. |
| stateKey | string | "places-state" | KVS key where the actor stores seen places between runs. |
| resetState | boolean | false | If true, clear stored state before running (fresh baseline). |
| dedupeStrategy | string | "placeId" | How to identify places: placeId or hash of key fields. |
| detectUpdates | boolean | true | In incremental mode, emit places whose key fields changed as “updated”. |
| updateFields | array of strings | see below | Fields used to detect updates (default: title, address, rating, reviewsCount, phone, website, categoryName, openingHours). |
| incrementalUpdate | boolean | false | Only process places that have changed since last run. |
Export & Output
| Option | Type | Default | Description |
|---|---|---|---|
| exportFormat | string | "json" | Output format: json, csv, or excel. |
| compressOutput | boolean | false | Compress output files. |
| saveSearchResults | boolean | false | Save all places extracted from the search list (before the maxPlaces limit) to dataset search-results. Use this to keep list-view data (e.g. 12 places) for later—e.g. feed placeUrls from that dataset into a follow-up run to get details for all of them. |
n8n Integration
First-class support for ingesting n8n workflow events (e.g. from HTTP Request/Webhook nodes) with flexible field mapping and automatic attribution.
| Option | Type | Default | Description |
|---|---|---|---|
| n8nEvent | object | — | Raw n8n payload. When set (or when input contains workflow/execution ids), event is normalized and attribution is stored. |
| n8nFieldMapping | object | — | Map payload keys → canonical fields (e.g. {"query": "searchQueries", "city": "location", "limit": "maxPlaces"}). Default aliases: query→searchQueries, city→location, limit→maxPlaces. |
| n8nPromptMaxBytes | integer | 10240 | Max bytes of prompt string to include in SHA-256 hash (10 KB). |
| n8nLateEventThresholdSeconds | integer | 300 | Events older than this (seconds) are marked as late. |
| n8nPromptField | string | "prompt" | Payload key used for prompt hashing. |
Features: Accepts n8n events from HTTP Request/Webhook; maps custom field names to canonical input; automatic eventId when missing; prompt hashing (SHA-256) with size limit; late event detection and marking; full attribution (workflow → node → run). Attribution is written to the run (KVS: n8n-attribution.json, scraping-stats, summary.json) and to each dataset item as n8n: { eventId, isLateEvent, attribution }.
Example (n8n webhook body as input):
{"query": "restaurants","city": "Berlin","limit": 20,"workflowId": "abc123","executionId": "run-xyz","nodeName": "HTTP Request"}
With n8nFieldMapping for custom keys:
{"n8nEvent": { "search_term": "cafes", "location_name": "Munich", "workflowId": "w1" },"n8nFieldMapping": { "search_term": "searchQueries", "location_name": "location" }}
Using the Apify node in n8n
This actor works with the Apify integration for n8n: install the Apify community node (@apify/n8n-nodes-apify), add your Apify credentials, then use Run Actor to start this scraper.
1. Workflow shape
- Trigger: e.g. Webhook, Schedule, or Manual.
- Apify node: Resource → Actor, Operation → Run Actor.
- Actor: Select Stateful Google Maps Scraper (or your task).
- Custom input: JSON passed as the Actor input (see below).
- Optionally: Get Dataset Items (Dataset ID =
defaultDatasetIdfrom Run Actor output) to process results in n8n.
2. Custom input in “Run Actor”
You can pass either canonical input or n8n-style input (with optional attribution).
Canonical input (same as Apify Console):
{"searchQueries": ["restaurant"],"location": "New York, NY","maxPlaces": 20,"useProxy": true}
n8n-style input (e.g. from a Webhook/HTTP Request body): use short names; the actor maps them and can record attribution if you add workflow/execution ids:
{"query": "restaurants","city": "Berlin","limit": 20,"workflowId": "{{ $workflow.id }}","executionId": "{{ $execution.id }}","nodeName": "Run Actor"}
In n8n expressions, {{ $workflow.id }} and {{ $execution.id }} come from the execution context; use them so runs are attributed to the right workflow and run. The actor maps query → searchQueries, city → location, limit → maxPlaces and stores full attribution in the run and on each dataset item.
3. Chaining from a Webhook
If the trigger is a Webhook that sends { "query": "cafes", "city": "Paris", "limit": 10 }:
- In Run Actor, set Custom input to the webhook body (e.g.
{{ $json }}or the merged object). - The actor receives that payload, applies default aliases and optional attribution, then runs the scrape. Results are in the run’s default dataset; use Get Dataset Items in the next node with the run’s
defaultDatasetIdto read them.
For more on the Apify node (credentials, Run Actor, Get Dataset Items, triggers), see n8n integration.
OpenClaw Integration
Optional outbound integration: send scraped places to an OpenClaw gateway so an agent can use the data. Uses bounded concurrency (configurable) so the scraper is not blocked by each send.
| Option | Type | Default | Description |
|---|---|---|---|
| openClaw.enabled | boolean | false | Enable sending places to OpenClaw. |
| openClaw.gatewayUrl | string | — | OpenClaw gateway base URL (e.g. https://gateway.example.com:18789). |
| openClaw.token | string | — | Bearer token for gateway auth. Falls back to OPENCLAW_GATEWAY_TOKEN env. |
| openClaw.agentId | string | "main" | Target agent id (x-openclaw-agent-id). |
| openClaw.concurrency | integer | 5 | Max concurrent outbound requests (1–20). |
| openClaw.sendMode | string | "perPlace" | perPlace: one request per place; batch: group places into single requests. |
| openClaw.batchSize | integer | 10 | Places per batch when sendMode is batch. |
| openClaw.api | string | "responses" | responses = OpenResponses (/v1/responses); chatCompletions = OpenAI-compatible (/v1/chat/completions). |
Sends are non-blocking: each place is enqueued and sent under the concurrency limit while scraping continues. At the end of the run, the client drains the queue and waits for in-flight requests. Failed sends are logged and do not stop the scraper.
Example:
{"searchQueries": ["cafe"],"location": "Berlin","maxPlaces": 20,"openClaw": {"enabled": true,"gatewayUrl": "https://gateway.example.com:18789","agentId": "main","concurrency": 5,"sendMode": "perPlace","api": "responses"}}
Use the OPENCLAW_GATEWAY_TOKEN environment variable (e.g. in Apify secrets) instead of putting the token in input when possible.
Usage recommendations
- Use Apify Proxy — Set
useProxy: trueto reduce consent pages, blocks, and bot detection, especially when running on Apify or from data-center IPs. - Navigation timeout — If you see “Navigation timed out” errors, increase
navigationTimeout(e.g. 60–90 seconds or higher). The default is 90. - When using reviews — If you enable
includeReviews: true(and thusincludePlaceDetails: true), runs do more work per place (load detail page, then extract reviews). For reliable review extraction on Apify:- Set navigationTimeout to at least 90 (or 120 for slow networks).
- Give the run more memory (e.g. 2 GB) in the run settings so the browser has enough headroom; low memory can cause timeouts or empty review extraction.
- Incremental runs — Use
runMode: "incremental"for recurring scrapes so you only pay for new or updated places.
Example Inputs
Basic search (with proxy and timeout):
{"searchQueries": ["restaurant", "cafe"],"location": "New York, NY","maxResults": 50,"useProxy": true,"navigationTimeout": 120}
Incremental run (only new/updated places):
{"searchQueries": ["plumber"],"location": "London, UK","maxPlaces": 200,"runMode": "incremental","useProxy": true}
Scrape by Place IDs or URLs:
{"placeUrls": ["https://www.google.com/maps/place/My+Business/..."],"includePlaceDetails": true,"navigationTimeout": 90}
Custom geolocation (polygon):
{"searchQueries": ["restaurant"],"customGeolocation": {"type": "Polygon","coordinates": [[[-0.322813, 51.597165],[-0.31499, 51.388023],[0.060493, 51.389199],[0.051936, 51.60036],[-0.322813, 51.597165]]]}}
Output
Results are stored in the default dataset. Each item can include:
- Core:
title,address,street,city,state,postalCode,countryCode,url,placeId - Ratings:
rating,reviewsCount - Business:
categoryName,categories,price,phone,website,location(lat/lng when using place details) - Metadata:
searchUrl,scrapedAt,metadata,changeType(e.g."new","updated")
Example item:
{"title": "La Grande Boucherie","address": "145 W 53rd St","rating": 4.6,"reviewsCount": 9315,"url": "https://www.google.com/maps/place/...","placeId": "ChIJ58OtM8pZwokRbd6DT6gcVys","categoryName": "French","price": "$50–100","phone": null,"street": "145 W 53rd St","city": "New York","state": "NY","countryCode": "US","scrapedAt": "2026-02-06T07:01:07.146Z","changeType": "new"}
Dataset views (when applicable): main dataset, reviews, reviews-flat, images, contacts, changes (incremental). Stats and performance metrics are in the key-value store.
- reviews — One row per review with full fields (placeTitle, placeUrl, reviewerName, reviewText, stars, publishAt, likesCount, reviewUrl, etc.).
- reviews-flat — Same data in a simple, Compass-like shape for CSV/n8n/Make:
title,url,stars,name,reviewUrl,text. Use this for drop-in flat files; use reviews if you need extra fields. See OUTPUT.md#reviews-view for field mapping from reviews to a custom shape in one transformation step.
Where to find them on Apify: The default dataset holds places (with optional nested reviews). Other outputs are separate datasets in the same run. Open the run → Storage → Datasets and pick the dataset you need:
| Dataset | When | Contents |
|---|---|---|
| default | always | Places (one row per place) |
| reviews | includeReviews | One row per review (full fields) |
| reviews-flat | includeReviews | One row per review (title, url, stars, name, reviewUrl, text) |
| place-details | includePlaceDetails | Flattened place-detail rows |
| list-vs-detail-mismatch | includePlaceDetails | Places where list-view data differed from the detail page |
| changes | incremental run | New/updated places |
The run log also prints where to find reviews and list-vs-detail-mismatch.
Phone numbers are best-effort: they come from the place detail page. If a detail page fails to load (timeout or navigation interrupted), or the business doesn’t show a number, phone can be missing. Use navigationTimeout ≥ 90 and sufficient memory to improve reliability.
Run Modes in Detail
- Fresh — New scan; all scraped places are written; state is updated for future incremental runs.
- Incremental — Compares to stored state; only new and (if
detectUpdates: true) updated places are written; you are only charged for those. UsestateKeyto isolate state per “project”. SetresetState: trueonce to clear state and start over.
Data & privacy
The actor collects data that may qualify as personal data under GDPR and similar laws, including:
- Place/business data: addresses, phone numbers, websites, business names.
- Review data (when
includeReviews: true): reviewer names, review text, review URLs, reviewer photos.
You should have a legitimate purpose for processing this data and comply with GDPR, Google’s Terms of Service, and any other applicable law. Limit retention and access (e.g. via Apify dataset retention settings and in downstream systems). Enabling reviews or contacts increases the amount of personal data collected; use includeReviewerNames: false if you want to omit or anonymize reviewer names in the output.
Storage and output: Datasets, key-value state, and cache may contain personal data. Retention and deletion are your responsibility (Apify retention settings and any downstream storage). If the actor saves debug HTML (e.g. on errors), those files may contain PII and should be disabled or kept short-lived in production.
Compliance: This actor does not implement consent or a legal basis for processing. You are responsible for ensuring your use (and any scraping of personal data) has a valid legal basis and complies with Google’s ToS and applicable law (e.g. GDPR).
Proxy & Performance Tips
- Enable proxy (
useProxy: true) in production to reduce blocks and consent redirects. - Increase
navigationTimeout(e.g. 120–180 seconds) if you see "Navigation timed out" on slow networks or heavy pages. - Memory: See Memory requirements subsection below.
Memory requirements
The actor uses Playwright (Chromium) for Google Maps. Recommended: 2048 MB (2 GB) minimum; 4096 MB (4 GB) for large scrapes or many search URLs. URLs are processed one at a time to limit memory use. If you see "Memory is critically overloaded", increase memory in Apify Console → Actor → Run options.
Development
- Node.js 20+ required.
- Setup:
npm installthennpx playwright install chromium. - Local test:
npm run test:local [input-file.json] - Scripts:
npm start(Apify),npm test,npm run lint
See FEATURES.md for architecture and technical details.
License
Apache-2.0
