Stateful Google Maps Scraper avatar

Stateful Google Maps Scraper

Pricing

Pay per event

Go to Apify Store
Stateful Google Maps Scraper

Stateful Google Maps Scraper

High-performance Google Maps scraper with caching and incremental updates. Avoid re-scraping unchanged places, optimize speed automatically, and stream validated results — built for fast, cost-efficient recurring scraping.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Hayder Al-Khalissi

Hayder Al-Khalissi

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

3

Monthly active users

16 days ago

Last modified

Share

Stateful Google Maps Scraper – Scrape Google Maps at Scale (OpenClaw & n8n Integration)

Scrape Google Maps for business listings, local SEO data, and place details. This Google Maps scraper runs on Apify and is built for recurring data collection, monitoring, and automation—so you get fresh, structured data without re-scraping everything every time.

Unlike one-shot scrapers, this actor remembers what it has already scraped. Run it daily or weekly: it only outputs new and changed places, so you save time, cost, and avoid duplicate work. Ideal for local business intelligence, competitor monitoring, lead generation, and Google Maps data extraction at scale.


Why Use This Google Maps Scraper?

  • Stateful & efficient — Tracks seen places across runs; incremental mode outputs only new/updated results.
  • Production-ready — Playwright, consent handling, proxy support, and adaptive rate limiting.
  • Flexible input — Search by query + location, or scrape by Place IDs or place URLs.
  • Rich output — Name, address, rating, reviews count, category, price, phone, website (with optional place-detail extraction).
  • Fully documented options — Timeouts, limits, export format, state, and more (see Input options below).

When to Use This Actor

Use this Google Maps data scraper when you need to:

  • Monitor business listings — Track new openings, closures, and rating changes over time.
  • Build local SEO datasets — Extract names, addresses, categories, and contact data for many areas.
  • Run scheduled scrapes — Daily or weekly runs that only process new or updated places.
  • Scrape by Place ID or URL — Feed specific Google Maps place IDs or URLs for targeted extraction.
  • Reduce cost — Incremental mode means you are only charged for new/updated places, not duplicates.

For a single one-off scrape, any crawler can work; for recurring Google Maps scraping and change detection, this actor is built for that.


Key Features

Stateful Scraping & Efficiency

FeatureDescription
Smart stateStores seen places in Apify Key-Value Store; skip duplicates across runs.
Incremental modeOutput only new and updated places; never charged for skipped duplicates.
Change detectionCompare key fields (rating, phone, website, etc.) to detect updates.
CachingOptional in-run cache to avoid re-processing unchanged data.
StreamingOptional real-time streaming of results as they are collected.

Scraping & Reliability

FeatureDescription
Playwright (Chromium)Full JavaScript rendering for Google Maps search and place pages.
Consent handlingAutomatically accepts consent dialogs (EN, DE, FR, ES, IT).
Proxy supportUse Apify Proxy to reduce blocks and consent redirects (useProxy: true).
Navigation timeoutConfigurable navigationTimeout (seconds); increase if pages load slowly.
Page detectionDetects consent/captcha/blocked pages and logs clear warnings.
Rate limitingModes: adaptive, conservative, moderate, aggressive.

Data You Get

  • From search results: title, address, rating, reviews count, category, price, place URL, place ID.
  • Optional place details: Set includePlaceDetails: true to visit each place page and get phone, website, coordinates, categories, thumbnail (slower, more complete).
  • Optional reviews: Set includeReviews: true and includePlaceDetails: true to get review text, rating, author, and date for each place (extracted from place detail pages). A reviews dataset view gives one row per review. Reviewer name and review images are extracted when present; see OUTPUT.md for known quirks (e.g. occasional missing name, truncated text).
  • Optional images/contacts: Flags includeImages, includeContacts for future enrichment.

How It Works

  1. Build search URLs from searchQueries + location, or use placeIds / placeUrls for direct scraping.
  2. Open Google Maps in Playwright, handle consent, then scroll the results feed until no new places load or “end of list” is reached.
  3. Extract place data from result cards (and optionally from each place’s detail page if includePlaceDetails is on).
  4. Validate, normalize, and push to the dataset; state is updated for incremental runs.
  5. Stop when maxPlaces / maxResults is reached.

Fresh mode (runMode: "fresh"): Full scan; output all places; update state.
Incremental mode (runMode: "incremental"): Skip already-seen places; output only new/updated; charge only for written items.

Run 1 (fresh): Scrape → store state → output all
Run 2 (incremental): Scrape → compare to state → output new/updated only

📋 Input Options

Full schema: .actor/input_schema.json. Below is a complete reference of all options, grouped for clarity.

Search & Location

OptionTypeDefaultDescription
searchQueriesarray of strings["restaurant"]Search terms (e.g. ["restaurant", "cafe"]).
locationstringLocation string (e.g. "New York, NY").
categoriesarray of stringsGoogle Maps categories to filter results.
customGeolocationobjectCustom area: Polygon, MultiPolygon, or Point with coordinates.
placeIdsarray of stringsGoogle Maps Place IDs to scrape directly (no search).
placeUrlsarray of stringsFull Google Maps place URLs to scrape directly.

Limits

OptionTypeDefaultDescription
maxPlacesinteger100Stop after this many valid places (1–100000).
maxResultsinteger100Alias for maxPlaces; if both set, the lower value is used.
maxCrawledPlacesPerSearchintegerCap places crawled per search query (optional).

Data & Enrichment

OptionTypeDefaultDescription
includePlaceDetailsbooleanfalseVisit each place’s detail page for phone, website, coordinates, categories. Required for review extraction when includeReviews is true. Slower, more complete.
includeReviewsbooleanfalseExtract reviews for each place. Requires includePlaceDetails: true. Increases collection of personal data (review text, reviewer names, etc.).
includeImagesbooleanfalseExtract images for each place.
includeContactsbooleanfalseEnrich with contact information (e.g. from website). Increases collection of personal data.
includeEnrichmentbooleanfalseEnable additional data enrichment.
includeReviewerNamesbooleantrueIf false, reviewer name is omitted or anonymized in output (reviews and reviews-flat views). Use for privacy-friendly runs.
minimalLoggingbooleanfalseWhen true, info-level logs avoid place titles, full addresses, and full URLs (only counts and non-PII identifiers). Full details only in debug or with verbose logging.

Performance & Reliability

OptionTypeDefaultDescription
navigationTimeoutinteger90Max seconds to wait for a Google Maps page to load (10–300). Increase if you see “Navigation timed out” errors.
useProxybooleanfalseUse Apify Proxy for requests; recommended to reduce blocks and consent pages.
maxConcurrencyinteger10Max concurrent requests (1–50).
rateLimitstring"adaptive"Rate limiting: adaptive, conservative, moderate, or aggressive.
enableCachingbooleantrueEnable in-run caching to avoid re-scraping unchanged places.
streamResultsbooleanfalseStream results to the dataset in real time as they are collected.

State & Incremental Mode

OptionTypeDefaultDescription
runModestring"fresh""fresh" = full scan, output all; "incremental" = output only new/updated.
stateKeystring"places-state"KVS key where the actor stores seen places between runs.
resetStatebooleanfalseIf true, clear stored state before running (fresh baseline).
dedupeStrategystring"placeId"How to identify places: placeId or hash of key fields.
detectUpdatesbooleantrueIn incremental mode, emit places whose key fields changed as “updated”.
updateFieldsarray of stringssee belowFields used to detect updates (default: title, address, rating, reviewsCount, phone, website, categoryName, openingHours).
incrementalUpdatebooleanfalseOnly process places that have changed since last run.

Export & Output

OptionTypeDefaultDescription
exportFormatstring"json"Output format: json, csv, or excel.
compressOutputbooleanfalseCompress output files.
saveSearchResultsbooleanfalseSave all places extracted from the search list (before the maxPlaces limit) to dataset search-results. Use this to keep list-view data (e.g. 12 places) for later—e.g. feed placeUrls from that dataset into a follow-up run to get details for all of them.

n8n Integration

First-class support for ingesting n8n workflow events (e.g. from HTTP Request/Webhook nodes) with flexible field mapping and automatic attribution.

OptionTypeDefaultDescription
n8nEventobjectRaw n8n payload. When set (or when input contains workflow/execution ids), event is normalized and attribution is stored.
n8nFieldMappingobjectMap payload keys → canonical fields (e.g. {"query": "searchQueries", "city": "location", "limit": "maxPlaces"}). Default aliases: query→searchQueries, city→location, limit→maxPlaces.
n8nPromptMaxBytesinteger10240Max bytes of prompt string to include in SHA-256 hash (10 KB).
n8nLateEventThresholdSecondsinteger300Events older than this (seconds) are marked as late.
n8nPromptFieldstring"prompt"Payload key used for prompt hashing.

Features: Accepts n8n events from HTTP Request/Webhook; maps custom field names to canonical input; automatic eventId when missing; prompt hashing (SHA-256) with size limit; late event detection and marking; full attribution (workflow → node → run). Attribution is written to the run (KVS: n8n-attribution.json, scraping-stats, summary.json) and to each dataset item as n8n: { eventId, isLateEvent, attribution }.

Example (n8n webhook body as input):

{
"query": "restaurants",
"city": "Berlin",
"limit": 20,
"workflowId": "abc123",
"executionId": "run-xyz",
"nodeName": "HTTP Request"
}

With n8nFieldMapping for custom keys:

{
"n8nEvent": { "search_term": "cafes", "location_name": "Munich", "workflowId": "w1" },
"n8nFieldMapping": { "search_term": "searchQueries", "location_name": "location" }
}

Using the Apify node in n8n

This actor works with the Apify integration for n8n: install the Apify community node (@apify/n8n-nodes-apify), add your Apify credentials, then use Run Actor to start this scraper.

1. Workflow shape

  • Trigger: e.g. Webhook, Schedule, or Manual.
  • Apify node: ResourceActor, OperationRun Actor.
  • Actor: Select Stateful Google Maps Scraper (or your task).
  • Custom input: JSON passed as the Actor input (see below).
  • Optionally: Get Dataset Items (Dataset ID = defaultDatasetId from Run Actor output) to process results in n8n.

2. Custom input in “Run Actor”

You can pass either canonical input or n8n-style input (with optional attribution).

Canonical input (same as Apify Console):

{
"searchQueries": ["restaurant"],
"location": "New York, NY",
"maxPlaces": 20,
"useProxy": true
}

n8n-style input (e.g. from a Webhook/HTTP Request body): use short names; the actor maps them and can record attribution if you add workflow/execution ids:

{
"query": "restaurants",
"city": "Berlin",
"limit": 20,
"workflowId": "{{ $workflow.id }}",
"executionId": "{{ $execution.id }}",
"nodeName": "Run Actor"
}

In n8n expressions, {{ $workflow.id }} and {{ $execution.id }} come from the execution context; use them so runs are attributed to the right workflow and run. The actor maps querysearchQueries, citylocation, limitmaxPlaces and stores full attribution in the run and on each dataset item.

3. Chaining from a Webhook

If the trigger is a Webhook that sends { "query": "cafes", "city": "Paris", "limit": 10 }:

  • In Run Actor, set Custom input to the webhook body (e.g. {{ $json }} or the merged object).
  • The actor receives that payload, applies default aliases and optional attribution, then runs the scrape. Results are in the run’s default dataset; use Get Dataset Items in the next node with the run’s defaultDatasetId to read them.

For more on the Apify node (credentials, Run Actor, Get Dataset Items, triggers), see n8n integration.

OpenClaw Integration

Optional outbound integration: send scraped places to an OpenClaw gateway so an agent can use the data. Uses bounded concurrency (configurable) so the scraper is not blocked by each send.

OptionTypeDefaultDescription
openClaw.enabledbooleanfalseEnable sending places to OpenClaw.
openClaw.gatewayUrlstringOpenClaw gateway base URL (e.g. https://gateway.example.com:18789).
openClaw.tokenstringBearer token for gateway auth. Falls back to OPENCLAW_GATEWAY_TOKEN env.
openClaw.agentIdstring"main"Target agent id (x-openclaw-agent-id).
openClaw.concurrencyinteger5Max concurrent outbound requests (1–20).
openClaw.sendModestring"perPlace"perPlace: one request per place; batch: group places into single requests.
openClaw.batchSizeinteger10Places per batch when sendMode is batch.
openClaw.apistring"responses"responses = OpenResponses (/v1/responses); chatCompletions = OpenAI-compatible (/v1/chat/completions).

Sends are non-blocking: each place is enqueued and sent under the concurrency limit while scraping continues. At the end of the run, the client drains the queue and waits for in-flight requests. Failed sends are logged and do not stop the scraper.

Example:

{
"searchQueries": ["cafe"],
"location": "Berlin",
"maxPlaces": 20,
"openClaw": {
"enabled": true,
"gatewayUrl": "https://gateway.example.com:18789",
"agentId": "main",
"concurrency": 5,
"sendMode": "perPlace",
"api": "responses"
}
}

Use the OPENCLAW_GATEWAY_TOKEN environment variable (e.g. in Apify secrets) instead of putting the token in input when possible.


Usage recommendations

  • Use Apify Proxy — Set useProxy: true to reduce consent pages, blocks, and bot detection, especially when running on Apify or from data-center IPs.
  • Navigation timeout — If you see “Navigation timed out” errors, increase navigationTimeout (e.g. 60–90 seconds or higher). The default is 90.
  • When using reviews — If you enable includeReviews: true (and thus includePlaceDetails: true), runs do more work per place (load detail page, then extract reviews). For reliable review extraction on Apify:
    • Set navigationTimeout to at least 90 (or 120 for slow networks).
    • Give the run more memory (e.g. 2 GB) in the run settings so the browser has enough headroom; low memory can cause timeouts or empty review extraction.
  • Incremental runs — Use runMode: "incremental" for recurring scrapes so you only pay for new or updated places.

Example Inputs

Basic search (with proxy and timeout):

{
"searchQueries": ["restaurant", "cafe"],
"location": "New York, NY",
"maxResults": 50,
"useProxy": true,
"navigationTimeout": 120
}

Incremental run (only new/updated places):

{
"searchQueries": ["plumber"],
"location": "London, UK",
"maxPlaces": 200,
"runMode": "incremental",
"useProxy": true
}

Scrape by Place IDs or URLs:

{
"placeUrls": [
"https://www.google.com/maps/place/My+Business/..."
],
"includePlaceDetails": true,
"navigationTimeout": 90
}

Custom geolocation (polygon):

{
"searchQueries": ["restaurant"],
"customGeolocation": {
"type": "Polygon",
"coordinates": [[
[-0.322813, 51.597165],
[-0.31499, 51.388023],
[0.060493, 51.389199],
[0.051936, 51.60036],
[-0.322813, 51.597165]
]]
}
}

Output

Results are stored in the default dataset. Each item can include:

  • Core: title, address, street, city, state, postalCode, countryCode, url, placeId
  • Ratings: rating, reviewsCount
  • Business: categoryName, categories, price, phone, website, location (lat/lng when using place details)
  • Metadata: searchUrl, scrapedAt, metadata, changeType (e.g. "new", "updated")

Example item:

{
"title": "La Grande Boucherie",
"address": "145 W 53rd St",
"rating": 4.6,
"reviewsCount": 9315,
"url": "https://www.google.com/maps/place/...",
"placeId": "ChIJ58OtM8pZwokRbd6DT6gcVys",
"categoryName": "French",
"price": "$50–100",
"phone": null,
"street": "145 W 53rd St",
"city": "New York",
"state": "NY",
"countryCode": "US",
"scrapedAt": "2026-02-06T07:01:07.146Z",
"changeType": "new"
}

Dataset views (when applicable): main dataset, reviews, reviews-flat, images, contacts, changes (incremental). Stats and performance metrics are in the key-value store.

  • reviews — One row per review with full fields (placeTitle, placeUrl, reviewerName, reviewText, stars, publishAt, likesCount, reviewUrl, etc.).
  • reviews-flat — Same data in a simple, Compass-like shape for CSV/n8n/Make: title, url, stars, name, reviewUrl, text. Use this for drop-in flat files; use reviews if you need extra fields. See OUTPUT.md#reviews-view for field mapping from reviews to a custom shape in one transformation step.

Where to find them on Apify: The default dataset holds places (with optional nested reviews). Other outputs are separate datasets in the same run. Open the run → StorageDatasets and pick the dataset you need:

DatasetWhenContents
defaultalwaysPlaces (one row per place)
reviewsincludeReviewsOne row per review (full fields)
reviews-flatincludeReviewsOne row per review (title, url, stars, name, reviewUrl, text)
place-detailsincludePlaceDetailsFlattened place-detail rows
list-vs-detail-mismatchincludePlaceDetailsPlaces where list-view data differed from the detail page
changesincremental runNew/updated places

The run log also prints where to find reviews and list-vs-detail-mismatch.

Phone numbers are best-effort: they come from the place detail page. If a detail page fails to load (timeout or navigation interrupted), or the business doesn’t show a number, phone can be missing. Use navigationTimeout ≥ 90 and sufficient memory to improve reliability.


Run Modes in Detail

  • Fresh — New scan; all scraped places are written; state is updated for future incremental runs.
  • Incremental — Compares to stored state; only new and (if detectUpdates: true) updated places are written; you are only charged for those. Use stateKey to isolate state per “project”. Set resetState: true once to clear state and start over.

Data & privacy

The actor collects data that may qualify as personal data under GDPR and similar laws, including:

  • Place/business data: addresses, phone numbers, websites, business names.
  • Review data (when includeReviews: true): reviewer names, review text, review URLs, reviewer photos.

You should have a legitimate purpose for processing this data and comply with GDPR, Google’s Terms of Service, and any other applicable law. Limit retention and access (e.g. via Apify dataset retention settings and in downstream systems). Enabling reviews or contacts increases the amount of personal data collected; use includeReviewerNames: false if you want to omit or anonymize reviewer names in the output.

Storage and output: Datasets, key-value state, and cache may contain personal data. Retention and deletion are your responsibility (Apify retention settings and any downstream storage). If the actor saves debug HTML (e.g. on errors), those files may contain PII and should be disabled or kept short-lived in production.

Compliance: This actor does not implement consent or a legal basis for processing. You are responsible for ensuring your use (and any scraping of personal data) has a valid legal basis and complies with Google’s ToS and applicable law (e.g. GDPR).


Proxy & Performance Tips

  • Enable proxy (useProxy: true) in production to reduce blocks and consent redirects.
  • Increase navigationTimeout (e.g. 120–180 seconds) if you see "Navigation timed out" on slow networks or heavy pages.
  • Memory: See Memory requirements subsection below.

Memory requirements

The actor uses Playwright (Chromium) for Google Maps. Recommended: 2048 MB (2 GB) minimum; 4096 MB (4 GB) for large scrapes or many search URLs. URLs are processed one at a time to limit memory use. If you see "Memory is critically overloaded", increase memory in Apify Console → Actor → Run options.


Development

  • Node.js 20+ required.
  • Setup: npm install then npx playwright install chromium.
  • Local test: npm run test:local [input-file.json]
  • Scripts: npm start (Apify), npm test, npm run lint

See FEATURES.md for architecture and technical details.


License

Apache-2.0