# Google Maps Scraper + Data Clean (`liuyu.digitaltwin/google-maps-lead-optimizer-data-clean`) Actor

Scrape Google Maps businesses, then automatically clean, normalize, and enrich data. Get CRM-ready leads (phone, email, website, rating) with audit trail. Pay only for valid records.

- **URL**: https://apify.com/liuyu.digitaltwin/google-maps-lead-optimizer-data-clean.md
- **Developed by:** [Yu Liu](https://apify.com/liuyu.digitaltwin) (community)
- **Categories:** Lead generation, SEO tools, Automation
- **Stats:** 1 total users, 0 monthly users, 0.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $5.00 / 1,000 valid leads

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Google Maps Lead Optimizer + Data Clean

Scrape Google Maps businesses and get cleaned, sales‑ready leads in one step. Built‑in data normalization, deduplication, PII masking, and audit trail. Pay only for valid records.

### Features

- **Search by keyword & location** – Enter one or more search terms and a location (free text) to discover businesses exactly where you need them.
- **Comprehensive business data** – Extract name, address, phone, website, rating, review count, coordinates, categories, and more.
- **Smart data cleaning** – Automatic phone normalization (+1 XXX-XXX-XXXX), website URL cleanup, rating clamping (0‑5), and removal of commas from review counts.
- **Optional rich data** – Enable `enable_rich_data` to fetch emails, images, full reviews, opening hours, price level, and Google Maps URL (extra cost).
- **Review & reviewer information** – Scrape all reviews, including reviewer names, photos, and ratings. Sort reviews and retrieve up to any desired count.
- **Data quality** – Remove duplicates, detect outliers, handle missing values, and apply PII masking (phone, email – irreversible).
- **Full audit trail** – Task-level and rule-level logs, plus an ISO quality report.
- **Preview mode** – Test with a small number of rows before running a full extraction to control costs.

### Bugs, fixes, updates, and changelog

This product is under active development. If you encounter any issues, have feature requests, or would like to provide feedback, please open an issue on our GitHub repository:

👉 [here](https://github.com/yuliu-digitaltwin/google-maps-scraper/issues)

### Input Parameters (JSON)

Provide input as a JSON object. Example:
```json
{
  "searchStringsArray": ["coffee shop"],
  "locationQuery": "Manhattan, New York",
  "maxCrawledPlacesPerSearch": 10,
  "language": "en",
  "outputFormat": "csv",
  "previewMode": true,
  "previewRows": 10,
  "enable_rich_data": false
}
````

#### Parameter reference

- `searchStringsArray` (array, required): One or more search terms, e.g., \["dentist", "clinic"].
- `locationQuery` (string, required): Free text location, e.g., "New York, USA".
- `maxCrawledPlacesPerSearch` (integer, optional): Max results per search term (1‑5000, default 500).
- `language` (string, optional): Language code, e.g., "en", "zh-CN" (default "en").
- `placeMinimumStars` (string, optional): Minimum star rating: "0","2","2.5","3","3.5","4","4.5" (default "0").
- `skipClosedPlaces` (boolean, optional): If true, skip places marked as temporarily or permanently closed.
- `websiteFilter` (string, optional): Filter by website presence: "allPlaces", "withWebsite", "withoutWebsite".
- `enable_rich_data` (boolean, optional): If true, fetch additional fields like email, reviews, images, opening hours, price level (extra cost, default false).
- `outputFormat` (string, optional): "csv", "excel", "json" (default "csv").
- `previewMode` (boolean, optional): If true, only process first N rows (billed at actual row count).
- `previewRows` (integer, optional): Rows for preview mode (1‑10000, default 100).
- `enablePiiMasking` (boolean, optional): Mask phone numbers, emails (irreversible).

#### During the run

The Actor outputs detailed log messages so you can track its progress:

- Which search term is being processed
- How many raw records have been fetched
- Whether optional rich data is being requested
- Success or failure of each cleaning rule

If the input is invalid (e.g., missing required fields), the Actor will stop immediately and explain what went wrong.

### Accessing results

After a successful run, the cleaned leads are stored in the **Dataset** tab. Each business is a separate item. You can download the results as CSV, Excel, or JSON directly from the Console, or retrieve them programmatically via the Apify API (Python, JavaScript, etc.). For details, see the [Apify API documentation](https://docs.apify.com/api/v2).

### Output Fields

When `enable_rich_data = false` (default), you receive the core set (about 15 fields). With `enable_rich_data = true`, you also get optional fields.

| Field | Description | Always present |
|-------|-------------|----------------|
| business\_name | Business name | ✅ |
| address | Full formatted address | ✅ |
| street | Street address | ✅ |
| city | City | ✅ |
| state | State or region | ✅ |
| zip | Postal code | ✅ |
| country | Country code (e.g., US) | ✅ |
| phone | Normalized phone (+1 XXX-XXX-XXXX) | ✅ |
| website | Cleaned website URL | ✅ |
| avg\_rating | Average rating (0‑5) | ✅ |
| total\_reviews | Total review count | ✅ |
| latitude | Latitude | ✅ |
| longitude | Longitude | ✅ |
| place\_id | Google Place ID | ✅ |
| categories | Business categories (array) | ✅ |
| plus\_code | Google Plus Code | ❌ (rich only) |
| description | Business description | ❌ (rich only) |
| opening\_hours | Structured opening hours | ❌ (rich only) |
| price\_level | Price level (e.g., $$) | ❌ (rich only) |
| email | Business email | ❌ (rich only) |
| images | Array of image URLs | ❌ (rich only) |
| reviews | List of review objects | ❌ (rich only) |
| google\_maps\_url | Google Maps URL | ❌ (rich only) |

### How to use (Apify Console)

1. Go to the Actor page.
2. In the Input tab, switch to JSON mode.
3. Paste your JSON configuration (see example above).
4. Click **Start**.
5. Download the cleaned dataset from the **Dataset** tab.

### Output files

Besides the cleaned data, the Actor also stores:

- Task-level audit CSV
- Rule-level audit CSV
- Skipped rows CSV
- Quality report (HTML + JSON)
- Error report (if any)
- Debug log (if error)

All stored in the Apify Key‑Value store.

### Support

Email: liuyu.digitaltwin@outlook.com\
Please include your session ID when reporting issues.

***

# Actor input Schema

## `searchStringsArray` (type: `array`):

(required) One or more search terms, e.g., \["dentist", "clinic"]

## `locationQuery` (type: `string`):

(required) Free text location, e.g., 'New York, USA'

## `maxCrawledPlacesPerSearch` (type: `integer`):

(optional) Maximum number of places to fetch per search term

## `language` (type: `string`):

(optional) Language code, e.g., 'en', 'zh-CN'

## `placeMinimumStars` (type: `string`):

(optional) Filter places by minimum star rating

## `skipClosedPlaces` (type: `boolean`):

(optional) If true, skip places marked as temporarily or permanently closed

## `websiteFilter` (type: `string`):

(optional) Filter places based on whether they have a website

## `enable_rich_data` (type: `boolean`):

If true, fetch additional data like email, reviews, images, opening hours, price level, etc. This increases the underlying Google Maps scraper cost, so total price will be higher. Default is false.

## `outputFormat` (type: `string`):

(optional) Format of the cleaned output file

## `previewMode` (type: `boolean`):

(optional) If true, only process first N rows (billed at actual row count)

## `previewRows` (type: `integer`):

(optional) Number of rows to process in preview mode

## `enablePiiMasking` (type: `boolean`):

(optional) Mask phone numbers, emails, etc. (irreversible)

## Actor input object example

```json
{
  "searchStringsArray": [
    "coffee shop"
  ],
  "maxCrawledPlacesPerSearch": 500,
  "language": "en",
  "placeMinimumStars": "0",
  "skipClosedPlaces": false,
  "websiteFilter": "allPlaces",
  "enable_rich_data": false,
  "outputFormat": "csv",
  "previewMode": false,
  "previewRows": 100,
  "enablePiiMasking": false
}
```

# Actor output Schema

## `cleaned_data` (type: `string`):

The main dataset containing cleaned business leads (CSV/Excel/JSON).

## `task_audit` (type: `string`):

Task-level audit CSV.

## `rule_audit` (type: `string`):

Rule-level audit CSV.

## `skipped_rows` (type: `string`):

Rows skipped due to parsing errors.

## `quality_report_html` (type: `string`):

Quality report in HTML format.

## `quality_report_json` (type: `string`):

Quality report in JSON format.

## `error_report` (type: `string`):

Error report (if any).

## `debug_log` (type: `string`):

Debug log (if error occurred).

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchStringsArray": [
        "coffee shop"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("liuyu.digitaltwin/google-maps-lead-optimizer-data-clean").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "searchStringsArray": ["coffee shop"] }

# Run the Actor and wait for it to finish
run = client.actor("liuyu.digitaltwin/google-maps-lead-optimizer-data-clean").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchStringsArray": [
    "coffee shop"
  ]
}' |
apify call liuyu.digitaltwin/google-maps-lead-optimizer-data-clean --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=liuyu.digitaltwin/google-maps-lead-optimizer-data-clean",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Google Maps Scraper + Data Clean",
        "description": "Scrape Google Maps businesses, then automatically clean, normalize, and enrich data. Get CRM-ready leads (phone, email, website, rating) with audit trail. Pay only for valid records.",
        "version": "0.0",
        "x-build-id": "fnEATXmKOKUdBAWlJ"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/liuyu.digitaltwin~google-maps-lead-optimizer-data-clean/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-liuyu.digitaltwin-google-maps-lead-optimizer-data-clean",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/liuyu.digitaltwin~google-maps-lead-optimizer-data-clean/runs": {
            "post": {
                "operationId": "runs-sync-liuyu.digitaltwin-google-maps-lead-optimizer-data-clean",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/liuyu.digitaltwin~google-maps-lead-optimizer-data-clean/run-sync": {
            "post": {
                "operationId": "run-sync-liuyu.digitaltwin-google-maps-lead-optimizer-data-clean",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "searchStringsArray",
                    "locationQuery"
                ],
                "properties": {
                    "searchStringsArray": {
                        "title": "Search term(s)",
                        "type": "array",
                        "description": "(required) One or more search terms, e.g., [\"dentist\", \"clinic\"]",
                        "items": {
                            "type": "string"
                        }
                    },
                    "locationQuery": {
                        "title": "Location (free text)",
                        "type": "string",
                        "description": "(required) Free text location, e.g., 'New York, USA'"
                    },
                    "maxCrawledPlacesPerSearch": {
                        "title": "Max results per search",
                        "minimum": 1,
                        "maximum": 5000,
                        "type": "integer",
                        "description": "(optional) Maximum number of places to fetch per search term",
                        "default": 500
                    },
                    "language": {
                        "title": "Language",
                        "type": "string",
                        "description": "(optional) Language code, e.g., 'en', 'zh-CN'",
                        "default": "en"
                    },
                    "placeMinimumStars": {
                        "title": "Minimum rating",
                        "enum": [
                            "0",
                            "2",
                            "2.5",
                            "3",
                            "3.5",
                            "4",
                            "4.5"
                        ],
                        "type": "string",
                        "description": "(optional) Filter places by minimum star rating",
                        "default": "0"
                    },
                    "skipClosedPlaces": {
                        "title": "Skip closed places",
                        "type": "boolean",
                        "description": "(optional) If true, skip places marked as temporarily or permanently closed",
                        "default": false
                    },
                    "websiteFilter": {
                        "title": "Website presence",
                        "enum": [
                            "allPlaces",
                            "withWebsite",
                            "withoutWebsite"
                        ],
                        "type": "string",
                        "description": "(optional) Filter places based on whether they have a website",
                        "default": "allPlaces"
                    },
                    "enable_rich_data": {
                        "title": "Include optional fields (extra cost)",
                        "type": "boolean",
                        "description": "If true, fetch additional data like email, reviews, images, opening hours, price level, etc. This increases the underlying Google Maps scraper cost, so total price will be higher. Default is false.",
                        "default": false
                    },
                    "outputFormat": {
                        "title": "Output Format",
                        "enum": [
                            "csv",
                            "excel",
                            "json"
                        ],
                        "type": "string",
                        "description": "(optional) Format of the cleaned output file",
                        "default": "csv"
                    },
                    "previewMode": {
                        "title": "Preview Mode",
                        "type": "boolean",
                        "description": "(optional) If true, only process first N rows (billed at actual row count)",
                        "default": false
                    },
                    "previewRows": {
                        "title": "Preview Rows",
                        "minimum": 1,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "(optional) Number of rows to process in preview mode",
                        "default": 100
                    },
                    "enablePiiMasking": {
                        "title": "Enable PII Masking",
                        "type": "boolean",
                        "description": "(optional) Mask phone numbers, emails, etc. (irreversible)",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
