# Business Data Enricher — Clean, Match & Verify Listings (`ryanclinton/business-data-enricher`) Actor

Business data enrichment against Overture Maps POI data. Cleans and deduplicates by name + location, assigns stable GERS global IDs, grades data quality, flags leads (no website, unbranded). Resale-safe records. Territory mode pulls in bulk and tracks openings, closures and rebrands over time.

- **URL**: https://apify.com/ryanclinton/business-data-enricher.md
- **Developed by:** [Ryan Clinton](https://apify.com/ryanclinton) (community)
- **Categories:** Lead generation, Developer tools
- **Stats:** 2 total users, 1 monthly users, 0.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

$4.00 / 1,000 resolved places

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Business Data Enricher — Clean, Match & Verify Listings

![Business Data Enricher — messy lists in, clean businesses out](https://apifyforge.com/readme-assets/ryanclinton-business-data-enricher/hero.png)

**Clean, verify and enrich business lists at scale.** Upload a messy CSV of businesses, suppliers, store locations or leads — this **business data enrichment** actor removes duplicates, verifies each record against global place data, fills in missing details (category, website, socials), and assigns a stable ID you can reuse in every future run. You get back clean records you own and can legally resell.

This is not a Google Maps scraper. You bring a dirty list (a CRM export, store locations, a supplier sheet, a local business data export you already paid for) and get back canonical, deduplicated, enriched businesses — verified against global place data, with a full audit trail behind every match.

> **What this is NOT.** Not a Google Maps replacement — [Overture Maps](https://overturemaps.org), the open place dataset behind this actor, is monthly-refreshed and carries no live reviews, ratings, opening hours, or popular times. For "today's hours and current star rating," Google Maps wins. This actor wins on **bulk, legal resale, stable global IDs, and analytics** — the four things a Maps scraper structurally cannot give you.

### Who is this for?

- **Lead-generation agencies** — find businesses with no website, no socials, or unbranded independents: ready-to-pitch lists straight out of the run.
- **Local SEO agencies** — map a local market, benchmark competitor density, deduplicate a client's locations.
- **Business directories & marketplaces** — dedupe listings and assign stable IDs so the same place never appears twice, and safely merge future data sources onto the same ID.
- **Data teams** — clean and enrich a large business/place dataset against ground truth, with a stable join key for your warehouse.
- **Retail & franchise teams** — territory coverage, brand penetration, and competitor density for a catchment area.
- **Market researchers** — canonical place data you can legally build a product on.

### What it does, in one example

**Input** — three messy rows:

````

Dominos Pizza,       54.581, -5.940
Domino's Pizza BT9,  54.581, -5.940
Maccies,             54.597, -5.930

````

**Output** — each row resolved to a clean, enriched record:

```json
{
  "name": "Domino's Pizza",
  "category": "pizza_restaurant",
  "brand": "Domino's",
  "website": "dominos.co.uk",
  "socials": ["instagram.com/dominos"],
  "phone": "+44 28 …",
  "gersId": "08f2…",
  "confidence": 0.92,
  "leadSignals": []
}
````

…the two "Domino's" rows collapse onto one record, and **"Maccies" resolves to McDonald's** via the brand short-circuit.

**Result:** 3 rows → **2 verified businesses, 1 duplicate removed, 100% matched** — each enriched with category, website, socials and a stable ID you can pass back next run.

**How accurate is it?** Every match carries a confidence score and the exact reasons behind it. Uncertain matches go to a **review queue** instead of being guessed, and rows that don't match are returned explicitly — **nothing is ever silently dropped**.

> Enrichment fields like `website`, `socials` and `phone` are filled **where the source carries them** — coverage varies by place and region (densest in US/EU and urban areas). The actor never invents a value; a field the source doesn't have comes back empty. Match rates likewise depend on your input quality and region; the run summary reports yours.

![Sample output — matched, ambiguous and unmatched rows side by side](https://apifyforge.com/readme-assets/ryanclinton-business-data-enricher/output-table.png)

![What you get — cleans your list, stable IDs, resale-safe, review queue](https://apifyforge.com/readme-assets/ryanclinton-business-data-enricher/feature-callouts.png)

### Business Data Enricher vs a Google Maps scraper

Most place-data actors *extract* listings. This one *resolves* them — and does the things a scraper structurally can't on a list you already have:

| Task | Google Maps scraper | Business Data Enricher |
|---|---|---|
| Get a list of places | ✅ | ✅ |
| Deduplicate your list | ❌ | ✅ |
| Stable IDs that survive re-runs | ❌ | ✅ |
| Legally resell the output | ❌ | ✅ |
| Territory / competitor analytics | ❌ | ✅ |
| Review queue for uncertain matches | ❌ | ✅ |
| Re-match your existing records | ❌ | ✅ |
| Live reviews, ratings, opening hours | ✅ | ❌ |

The last row is the honest trade: for today's star rating and live hours, a Google Maps scraper wins. For cleaning, verifying and owning a *list you already have*, this does what a scraper can't.

### Why choose this actor

- **Bulk** — query over 100M+ places without result caps, pagination limits or blocking.
- **Legal resale** — built on Overture Places under **CDLA Permissive 2.0**. Every record carries a `resaleSafe` flag and an attribution string. (We query the `places` theme only — never the share-alike ODbL themes.)
- **Stable global IDs** — every matched row is stamped with a **GERS ID**, a persistent global fingerprint, so your data becomes joinable to any other dataset using the same IDs, forever. Re-runs are idempotent: pass the stored `gersId` back and the actor does a direct lookup instead of re-resolving.
- **Analytics** — density, brand concentration, **franchise footprint** (per-brand saturation), **market structure**, whitespace and nearest-competitor analysis built right into the actor — impossible for a per-place scraper.

![What you get — a messy list turned into clean, deduplicated, verified, enriched, resale-safe records](https://apifyforge.com/readme-assets/ryanclinton-business-data-enricher/intelligence-layers.png)

### Clean and deduplicate your first business list in 60 seconds

Paste a list, press **Start**:

```json
{
  "places": [
    { "id": "row-1", "name": "Dominos Pizza", "lat": 54.581, "lng": -5.9398 },
    { "id": "row-2", "name": "Domino's", "lat": 54.5811, "lng": -5.9399 },
    { "id": "row-3", "name": "Maccies", "lat": 54.5972, "lng": -5.9301 }
  ],
  "outputProfile": "enriched"
}
```

You get back canonical, deduped, enriched entities: the two Domino's rows collapse onto one GERS id (an `entity-group` record records the collapse), "Maccies" resolves to McDonald's via the brand short-circuit, and a `run-summary` record tells you the coverage, dedup, and review-queue headline. Every row is resale-safe.

### Input modes (auto-detected)

| You provide | Mode | What you get |
|---|---|---|
| `places` (BYO list) | **resolution** | one resolution record per input row + entity-group + review-item + run-summary |
| `territoryQuery` (a bbox) | **territory** | every canonical entity in the area + a territory analytics summary |
| any row carrying a `gersId` | **idempotent re-match** | direct GERS lookup, cascade skipped |

#### Resolution input

Each item in `places`: `{ "name": "...", "lat": 54.58, "lng": -5.93 }` or `{ "name": "...", "address": "..." }`. Optional `id` (echoed as `inputId`), `category` (improves the category gate — e.g. an FSA `business_type` like `"Restaurant/Cafe/Canteen"` works as-is), and `gersId` (idempotent re-match).

The list scales from a handful of rows to **tens of thousands spanning multiple cities or a whole country** — resolved in a single run, no need to split by region. Rows with no coordinates fall back to address-text matching at a lower, flagged confidence — but that's a full scan per row, so it's **capped** (supply `lat`/`lng` to resolve at scale; rows past the cap come back `unmatched` with an explanation, never silently dropped).

#### Territory input

`territoryQuery`: a bounding box `"minLng,minLat,maxLng,maxLat"`, e.g. `"-6.05,54.55,-5.80,54.65"`. Append a category filter after a pipe: `"-6.05,54.55,-5.80,54.65 | coffee"`. Set `outputProfile` to `territory`, or just leave `places` empty.

### Track what's changed in an area (event mode)

Run a territory with **`emitEvents: true`** and the actor diffs the current Overture release against an earlier one (over your bbox) and emits a **commercial change feed** — the thing a one-shot scrape can never give you:

- **Typed events per place:** `NEW_LOCATION`, `LOCATION_CLOSED`, `LOCATION_MOVED`, `REBRAND`, `CATEGORY_SHIFT`, each with a severity score.
- **Brand expansion / contraction:** which chains opened or closed net locations in the window (e.g. "Costa +4, Subway −2").
- **Market warnings:** categories with a high closure rate, stated with the denominator (`"8/12 closed"`), never an investment verdict.
- **Successor candidates** (opt-in `includeSuccessors`): a place closed and a new one opened at the same coordinates — flagged as a *candidate* with a confidence, never asserted.
- A decision-first **`territory-digest`** record: openings, closures, expanding/contracting chains, warnings, lead count.

Leave `compareRelease` blank to auto-diff against the prior public release (a ~1-month window, available immediately). Set a **`watchlistName`** to snapshot each run into your own private history and track change across a longer window than the two public releases allow — the first run captures a baseline, changes are reported from the next run. The watchlist also builds a per-entity **category timeline** (`categoryChangeHistory`) and remembers your analyst **review decisions** (`reviewDecisions` input), echoing them back on the matching changes so a disposition survives re-runs.

> Built on open data you can resell, this turns "scrape a place list" into "monitor a market" on the same engine.

### Matching you can trust

Every place is matched on **location, name and category together** — never on name alone — and each match comes back with a **confidence score and the reasons behind it**, broken into its parts so you can see *why* it matched (or didn't).

- **Close calls go to a review queue** instead of being guessed — the actor never silently picks between two plausible candidates.
- **Nothing is silently dropped** — rows that don't match are returned explicitly as `unmatched` with the best near-miss, so you always see what didn't resolve and why.
- **Fully deterministic** — the same input always produces the same result. No black box, no model drift, no surprises.

You can tune precision vs recall with the `matchProfile` preset (`strict` / `balanced` / `lenient`) without touching anything else.

### Output profiles (`outputProfile`)

- **`enriched`** (default) — the full record: `match`, `canonical`, `quality`, `lifecycle`, `leadSignals`, `digitalPresence`, `resaleSafe`, `agentContract`.
- **`names`** — the lean display-name surface: `{ inputId, gersId, name, normalizedCategory, confidence (with decomposed components), ambiguity, runnerUpGap, status }`. The right profile if all you persist is "canonical name + a stable key + a score to threshold on."
- **`gers_only`** — `{ inputId, gersId, confidence, status }`. The minimal join key for a warehouse that already holds the names.
- **`audit`** — adds every candidate considered and why it was rejected. For tuning thresholds and proving matches.
- **`territory`** — bulk-pull canonical entities + the analytics summary.

### Record types

Discriminate on `recordType`: `resolution` | `entity-group` | `review-item` | `run-summary` | `canonical-entity` | `territory-summary`. The dataset ships decision-first views — **Matched**, **Review queue**, **Unmatched**, **Run summary** — and a KV `SUMMARY` record mirroring the coverage/dedup/review headline.

### Data quality grades, match confidence and lead signals

- **Reason chain** — `match.matchReason[]` reads back the exact thresholds the cascade branched on.
- **Per-attribute corroboration** — `match.matchEvidence` is `null` until your input row carries that field, so nothing is fabricated on data you never gave.
- **Data-quality axis** — `quality` (grade A–F, completeness, issues) is distinct from `match.confidence`. A confidently-matched place can still have a defect-laden record; the two questions get two answers.
- **Lifecycle band** — `lifecycle.status` is a descriptive band over evidence (operating status, low confidence), never a fabricated "closed" verdict.
- **Lead signals** — `leadSignals[]` (NO\_WEBSITE, NO\_INSTAGRAM, UNBRANDED, INDEPENDENT, …) off already-fetched data. "Dentists in this metro with no website" is a ready-to-sell list at no extra cost.

### Key inputs

| Input | Default | Notes |
|---|---|---|
| `matchProfile` | `balanced` | `strict` (precision-first) / `balanced` / `lenient` (recall-first). A preset threshold pack, not a rule engine. |
| `matchRadiusMeters` | `150` | tighten (e.g. 50) for premise-accurate matches. |
| `nameSimStrong` / `nameSimWeak` | `0.89` / `0.83` | power-user overrides; `matchProfile` sets these. |
| `overtureRelease` | `2026-05-20.0` | the us-west-2 bucket retains only the latest ~2 releases. |
| `minConfidence` | `0.5` | drop ground-truth candidates below this Overture confidence. |
| `includeLifecycle` / `emitLeadSignals` | `true` | cheap, on by default. |
| `includeMarketContext` / `includeGlobalBrandStats` / `includeGraphEdges` | `false` | opt-in extra reads. |

### What this is not (stated up front)

- **Not a Google Maps replacement** — no live reviews, ratings, or hours.
- **Coverage is uneven by region** — best in US / EU / urban areas; thin-coverage regions match lower. The territory summary surfaces a `coverageConfidence` signal so you know where to trust the data.
- **Not a legal-entity → trading-name resolver** — the cascade matches names as written. A divergent legal name vs trading name ("Bushmills Hotels Ltd" → "The Bushmills Inn") lands in the review queue, exactly as a plain fuzzy+spatial resolver would. The brand short-circuit rescues chains; divergent-name independents still need a human.
- **Not a geocoder** — no-coord rows get flagged address-text matching (capped — each is a full scan), not precise geocoding.

### Pricing

**Pay for hits, not for list size.** **$0.004 per resolved place** — a confirmed match in resolution mode, or a returned canonical entity in territory mode. Unmatched rows, the review queue and all analytics are **included free**: a 50,000-row list that matches 18,000 places costs for the 18,000, not the 50,000. No proxies and no per-place data fees — Overture reads are anonymous and AWS-sponsored — so on top of the per-place charge you pay only Apify platform compute.

### Attribution

Built on **Overture Maps** Places data, **CDLA Permissive 2.0**. Every output record carries the attribution string and a `resaleSafe` flag.

### Deliver results to Slack or Notion (MCP connectors)

Optionally pipe each run's **decisions** — the resolution/territory digest plus the ranked review worklist — straight into your own Slack channel or Notion workspace. You never hand this actor a token: you connect Slack/Notion once in **Apify Console → Settings → MCP connectors** (Notion is one click), and Apify proxies the credentials. The actor only ever receives a connector id.

- **Notion** — set `notionConnector`. Get a one-page resolution report (digest + top review items), or a page per review item with `notionArchiveProfile: per-review`.
- **Slack** — set `slackConnector` (and optionally `slackChannel`). The digest is posted, plus review items at or above `slackMinReviewPriority` (default 50) so the channel stays signal-only. (Slack connectors need you to register your own Slack OAuth app.)

Only the **decisions** are delivered — never the bulk resolution rows. Leave the connector fields empty and the run behaves exactly as before. The delivery outcome is reported back on the run-summary's `deliveries` block.

# Actor input Schema

## `places` (type: `array`):

Your messy list of businesses/places to resolve. Each item: { "name": "...", "lat": 54.58, "lng": -5.93 } or { "name": "...", "address": "..." }. Optionally pass an "id" per row (echoed back as inputId), a "category" (improves precision), and a previously-stored "gersId" for an idempotent direct lookup. No row cap — scales to large national / continental lists; the actor processes them city-by-city in one run, bounded only by the run timeout. Leave empty and set Territory query for bulk pull mode.

## `territoryQuery` (type: `string`):

Alternative to Places. A bounding box "minLng,minLat,maxLng,maxLat" (e.g. "-6.05,54.55,-5.80,54.65") pulls all canonical entities in that area plus analytics. A bbox is REQUIRED — plain area names (e.g. "Belfast") are not supported and will return an error. Optionally append a category filter after a pipe, e.g. "-6.05,54.55,-5.80,54.65 | coffee".

## `outputProfile` (type: `string`):

enriched = full record (default). names = lean { name, gersId, decomposed confidence, ambiguity, status }. gers\_only = minimal join key. audit = adds every candidate considered + why rejected. territory = bulk-pull canonical entities + analytics summary.

## `matchProfile` (type: `string`):

Preset threshold packs (not a rule engine). strict = fewer, surer matches (dedup/KYC). balanced = default. lenient = recall-first (territory mapping). Per-field overrides below still win.

## `matchRadiusMeters` (type: `integer`):

Spatial pre-filter radius for candidate generation around each input point. Default 150m. Tighten (e.g. 50) for premise-accurate matches.

## `nameSimStrong` (type: `number`):

jaro\_winkler threshold at/above which a name match auto-accepts. matchProfile sets this; override for power tuning.

## `nameSimWeak` (type: `number`):

jaro\_winkler threshold for an accept-with-flag weak match (requires category compatibility + small distance). Below this = no match.

## `sources` (type: `array`):

Ground-truth sources. v1 supports Overture only; requesting foursquare logs a 'not yet supported' notice and proceeds with Overture (provenance is designed so FSQ grafting can be added later).

## `overtureRelease` (type: `string`):

Overture Maps release id to read. The us-west-2 bucket retains only the latest ~2 releases; a pinned older release 404s once two newer ones publish.

## `minConfidence` (type: `number`):

Drop ground-truth candidates whose Overture confidence is below this. Default 0.5.

## `includeClosed` (type: `boolean`):

By default places flagged closed/non-operating are excluded. Enable to include them (e.g. historical analysis).

## `includeLifecycle` (type: `boolean`):

Single-run lifecycle band (operating | flagged | likely\_closed | closed | new) from operating status + confidence + source signals. Cheap; on by default.

## `emitLeadSignals` (type: `boolean`):

Digital-presence gap signals (NO\_WEBSITE, NO\_INSTAGRAM, UNBRANDED, INDEPENDENT, ...) off already-fetched data. Free; on by default.

## `includeMarketContext` (type: `boolean`):

Per-place competitor counts (within 500m/1km, nearest competitor, same-brand within 5km). Free in territory mode; extra spatial reads per point in resolution mode.

## `includeGlobalBrandStats` (type: `boolean`):

Counts each matched brand's global location total across Overture keyed on brand.wikidata. Extra bounded query per distinct brand; cost stated in the startup log. Never asserts a market-share denominator.

## `includeGraphEdges` (type: `boolean`):

Adds same\_brand / co\_located relationship edges on entity-group records (free). same\_owner is not shipped (no ownership data in source).

## `enableEmbeddingRescue` (type: `boolean`):

Reserved for a future optional embedding rescue of the unmatched remainder. Not implemented yet; setting it has no effect.

## `emitEvents` (type: `boolean`):

Emit the commercial change stream for the territory. When on and Compare release is blank, the prior public Overture release is used as the baseline (logged).

## `compareRelease` (type: `string`):

Earlier Overture release id to diff the current release against (e.g. "2026-04-15.0"). Leave blank with Event mode on to auto-use the prior public release. The us-west-2 bucket retains only the latest ~2 releases, so a much older id will 404.

## `includeSuccessors` (type: `boolean`):

When a place closes and a new one opens at the same coordinates, flag it as a successor CANDIDATE (with a confidence, never asserted). Event mode only.

## `watchlistName` (type: `string`):

Name a watchlist to persist a snapshot each run and diff against it next time — this builds change history (including per-entity category timelines) beyond the 2 public releases. First run captures a baseline; changes are reported from the next run. Needs a full-access token (restricted tokens can't open named stores).

## `reviewDecisions` (type: `array`):

Watchlist mode only. Remembered human dispositions: \[{ "gersId": "08f2…", "status": "confirmed" }]. Echoed back on the matching change events next run, so an analyst's call survives re-runs.

## `demandLayer` (type: `object`):

Optional population/footfall layer. When supplied, the opportunity/whitespace score upgrades from supply-side-only to demand-aware. Leave empty for the honest supply-side score.

## `notionConnector` (type: `string`):

Optional. Connect a Notion workspace to receive a one-page resolution/territory report (and review worklist) at the end of the run. Apify proxies your credentials — this actor never sees your Notion token. Create one in Apify Console → Settings → MCP connectors (Notion is one-click).

## `notionArchiveProfile` (type: `string`):

summary = one page per run (digest + top review items). per-review = one page per review-worklist item (capped at 50).

## `notionDatabaseId` (type: `string`):

Optional. A Notion data source id to write pages into a specific database. Leave blank to create standalone workspace pages.

## `slackConnector` (type: `string`):

Optional. Connect Slack to post the resolution/territory digest plus high-priority review items to a channel. Apify proxies your credentials — this actor never sees your Slack token. (Slack connectors require you to register your own Slack OAuth app.)

## `slackChannel` (type: `string`):

Optional channel id/name to post into (e.g. "#data-ops"). If omitted, the connector's default channel is used.

## `slackMinReviewPriority` (type: `integer`):

Only review-worklist items at or above this priority (0-100) are posted to Slack, to keep the channel signal-only. Default 50. The digest is always posted.

## Actor input object example

```json
{
  "places": [
    {
      "id": "row-1",
      "name": "Dominos Pizza",
      "lat": 54.581,
      "lng": -5.9398
    },
    {
      "id": "row-2",
      "name": "Domino's",
      "lat": 54.5811,
      "lng": -5.9399
    },
    {
      "id": "row-3",
      "name": "Maccies",
      "lat": 54.5972,
      "lng": -5.9301
    }
  ],
  "outputProfile": "enriched",
  "matchProfile": "balanced",
  "matchRadiusMeters": 150,
  "nameSimStrong": 0.89,
  "nameSimWeak": 0.83,
  "sources": [
    "overture"
  ],
  "overtureRelease": "2026-05-20.0",
  "minConfidence": 0.5,
  "includeClosed": false,
  "includeLifecycle": true,
  "emitLeadSignals": true,
  "includeMarketContext": false,
  "includeGlobalBrandStats": false,
  "includeGraphEdges": false,
  "enableEmbeddingRescue": false,
  "emitEvents": false,
  "compareRelease": "",
  "includeSuccessors": false,
  "watchlistName": "",
  "notionArchiveProfile": "summary",
  "notionDatabaseId": "",
  "slackChannel": "",
  "slackMinReviewPriority": 50
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "places": [
        {
            "id": "row-1",
            "name": "Dominos Pizza",
            "lat": 54.581,
            "lng": -5.9398
        },
        {
            "id": "row-2",
            "name": "Domino's",
            "lat": 54.5811,
            "lng": -5.9399
        },
        {
            "id": "row-3",
            "name": "Maccies",
            "lat": 54.5972,
            "lng": -5.9301
        }
    ],
    "territoryQuery": "",
    "outputProfile": "enriched",
    "matchProfile": "balanced",
    "matchRadiusMeters": 150,
    "nameSimStrong": 0.89,
    "nameSimWeak": 0.83,
    "sources": [
        "overture"
    ],
    "overtureRelease": "2026-05-20.0",
    "minConfidence": 0.5,
    "compareRelease": "",
    "watchlistName": "",
    "notionArchiveProfile": "summary",
    "notionDatabaseId": "",
    "slackChannel": "",
    "slackMinReviewPriority": 50
};

// Run the Actor and wait for it to finish
const run = await client.actor("ryanclinton/business-data-enricher").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "places": [
        {
            "id": "row-1",
            "name": "Dominos Pizza",
            "lat": 54.581,
            "lng": -5.9398,
        },
        {
            "id": "row-2",
            "name": "Domino's",
            "lat": 54.5811,
            "lng": -5.9399,
        },
        {
            "id": "row-3",
            "name": "Maccies",
            "lat": 54.5972,
            "lng": -5.9301,
        },
    ],
    "territoryQuery": "",
    "outputProfile": "enriched",
    "matchProfile": "balanced",
    "matchRadiusMeters": 150,
    "nameSimStrong": 0.89,
    "nameSimWeak": 0.83,
    "sources": ["overture"],
    "overtureRelease": "2026-05-20.0",
    "minConfidence": 0.5,
    "compareRelease": "",
    "watchlistName": "",
    "notionArchiveProfile": "summary",
    "notionDatabaseId": "",
    "slackChannel": "",
    "slackMinReviewPriority": 50,
}

# Run the Actor and wait for it to finish
run = client.actor("ryanclinton/business-data-enricher").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "places": [
    {
      "id": "row-1",
      "name": "Dominos Pizza",
      "lat": 54.581,
      "lng": -5.9398
    },
    {
      "id": "row-2",
      "name": "Domino'\''s",
      "lat": 54.5811,
      "lng": -5.9399
    },
    {
      "id": "row-3",
      "name": "Maccies",
      "lat": 54.5972,
      "lng": -5.9301
    }
  ],
  "territoryQuery": "",
  "outputProfile": "enriched",
  "matchProfile": "balanced",
  "matchRadiusMeters": 150,
  "nameSimStrong": 0.89,
  "nameSimWeak": 0.83,
  "sources": [
    "overture"
  ],
  "overtureRelease": "2026-05-20.0",
  "minConfidence": 0.5,
  "compareRelease": "",
  "watchlistName": "",
  "notionArchiveProfile": "summary",
  "notionDatabaseId": "",
  "slackChannel": "",
  "slackMinReviewPriority": 50
}' |
apify call ryanclinton/business-data-enricher --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=ryanclinton/business-data-enricher",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Business Data Enricher — Clean, Match & Verify Listings",
        "description": "Business data enrichment against Overture Maps POI data. Cleans and deduplicates by name + location, assigns stable GERS global IDs, grades data quality, flags leads (no website, unbranded). Resale-safe records. Territory mode pulls in bulk and tracks openings, closures and rebrands over time.",
        "version": "1.0",
        "x-build-id": "UswKs1I9kaenb2AXt"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/ryanclinton~business-data-enricher/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-ryanclinton-business-data-enricher",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/ryanclinton~business-data-enricher/runs": {
            "post": {
                "operationId": "runs-sync-ryanclinton-business-data-enricher",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/ryanclinton~business-data-enricher/run-sync": {
            "post": {
                "operationId": "run-sync-ryanclinton-business-data-enricher",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "places": {
                        "title": "Places (bring-your-own list)",
                        "type": "array",
                        "description": "Your messy list of businesses/places to resolve. Each item: { \"name\": \"...\", \"lat\": 54.58, \"lng\": -5.93 } or { \"name\": \"...\", \"address\": \"...\" }. Optionally pass an \"id\" per row (echoed back as inputId), a \"category\" (improves precision), and a previously-stored \"gersId\" for an idempotent direct lookup. No row cap — scales to large national / continental lists; the actor processes them city-by-city in one run, bounded only by the run timeout. Leave empty and set Territory query for bulk pull mode."
                    },
                    "territoryQuery": {
                        "title": "Territory query (bulk pull mode)",
                        "type": "string",
                        "description": "Alternative to Places. A bounding box \"minLng,minLat,maxLng,maxLat\" (e.g. \"-6.05,54.55,-5.80,54.65\") pulls all canonical entities in that area plus analytics. A bbox is REQUIRED — plain area names (e.g. \"Belfast\") are not supported and will return an error. Optionally append a category filter after a pipe, e.g. \"-6.05,54.55,-5.80,54.65 | coffee\"."
                    },
                    "outputProfile": {
                        "title": "Output profile",
                        "enum": [
                            "enriched",
                            "names",
                            "gers_only",
                            "audit",
                            "territory"
                        ],
                        "type": "string",
                        "description": "enriched = full record (default). names = lean { name, gersId, decomposed confidence, ambiguity, status }. gers_only = minimal join key. audit = adds every candidate considered + why rejected. territory = bulk-pull canonical entities + analytics summary.",
                        "default": "enriched"
                    },
                    "matchProfile": {
                        "title": "Match profile (threshold preset)",
                        "enum": [
                            "strict",
                            "balanced",
                            "lenient"
                        ],
                        "type": "string",
                        "description": "Preset threshold packs (not a rule engine). strict = fewer, surer matches (dedup/KYC). balanced = default. lenient = recall-first (territory mapping). Per-field overrides below still win.",
                        "default": "balanced"
                    },
                    "matchRadiusMeters": {
                        "title": "Match radius (metres)",
                        "minimum": 5,
                        "maximum": 5000,
                        "type": "integer",
                        "description": "Spatial pre-filter radius for candidate generation around each input point. Default 150m. Tighten (e.g. 50) for premise-accurate matches.",
                        "default": 150
                    },
                    "nameSimStrong": {
                        "title": "Name similarity — strong tier",
                        "minimum": 0.5,
                        "maximum": 1,
                        "type": "number",
                        "description": "jaro_winkler threshold at/above which a name match auto-accepts. matchProfile sets this; override for power tuning.",
                        "default": 0.89
                    },
                    "nameSimWeak": {
                        "title": "Name similarity — weak tier",
                        "minimum": 0.5,
                        "maximum": 1,
                        "type": "number",
                        "description": "jaro_winkler threshold for an accept-with-flag weak match (requires category compatibility + small distance). Below this = no match.",
                        "default": 0.83
                    },
                    "sources": {
                        "title": "Data sources",
                        "type": "array",
                        "description": "Ground-truth sources. v1 supports Overture only; requesting foursquare logs a 'not yet supported' notice and proceeds with Overture (provenance is designed so FSQ grafting can be added later).",
                        "default": [
                            "overture"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "overtureRelease": {
                        "title": "Overture release",
                        "type": "string",
                        "description": "Overture Maps release id to read. The us-west-2 bucket retains only the latest ~2 releases; a pinned older release 404s once two newer ones publish.",
                        "default": "2026-05-20.0"
                    },
                    "minConfidence": {
                        "title": "Minimum Overture confidence",
                        "minimum": 0,
                        "maximum": 1,
                        "type": "number",
                        "description": "Drop ground-truth candidates whose Overture confidence is below this. Default 0.5.",
                        "default": 0.5
                    },
                    "includeClosed": {
                        "title": "Include closed/non-operating places",
                        "type": "boolean",
                        "description": "By default places flagged closed/non-operating are excluded. Enable to include them (e.g. historical analysis).",
                        "default": false
                    },
                    "includeLifecycle": {
                        "title": "Include lifecycle signals",
                        "type": "boolean",
                        "description": "Single-run lifecycle band (operating | flagged | likely_closed | closed | new) from operating status + confidence + source signals. Cheap; on by default.",
                        "default": true
                    },
                    "emitLeadSignals": {
                        "title": "Emit lead signals",
                        "type": "boolean",
                        "description": "Digital-presence gap signals (NO_WEBSITE, NO_INSTAGRAM, UNBRANDED, INDEPENDENT, ...) off already-fetched data. Free; on by default.",
                        "default": true
                    },
                    "includeMarketContext": {
                        "title": "Include per-record market context",
                        "type": "boolean",
                        "description": "Per-place competitor counts (within 500m/1km, nearest competitor, same-brand within 5km). Free in territory mode; extra spatial reads per point in resolution mode.",
                        "default": false
                    },
                    "includeGlobalBrandStats": {
                        "title": "Include global brand footprint (computed)",
                        "type": "boolean",
                        "description": "Counts each matched brand's global location total across Overture keyed on brand.wikidata. Extra bounded query per distinct brand; cost stated in the startup log. Never asserts a market-share denominator.",
                        "default": false
                    },
                    "includeGraphEdges": {
                        "title": "Include graph edges on entity groups",
                        "type": "boolean",
                        "description": "Adds same_brand / co_located relationship edges on entity-group records (free). same_owner is not shipped (no ownership data in source).",
                        "default": false
                    },
                    "enableEmbeddingRescue": {
                        "title": "Embedding rescue (v2 — inert)",
                        "type": "boolean",
                        "description": "Reserved for a future optional embedding rescue of the unmatched remainder. Not implemented yet; setting it has no effect.",
                        "default": false
                    },
                    "emitEvents": {
                        "title": "Event mode — emit commercial change events",
                        "type": "boolean",
                        "description": "Emit the commercial change stream for the territory. When on and Compare release is blank, the prior public Overture release is used as the baseline (logged).",
                        "default": false
                    },
                    "compareRelease": {
                        "title": "Compare release (baseline for the diff)",
                        "type": "string",
                        "description": "Earlier Overture release id to diff the current release against (e.g. \"2026-04-15.0\"). Leave blank with Event mode on to auto-use the prior public release. The us-west-2 bucket retains only the latest ~2 releases, so a much older id will 404.",
                        "default": ""
                    },
                    "includeSuccessors": {
                        "title": "Detect successor candidates",
                        "type": "boolean",
                        "description": "When a place closes and a new one opens at the same coordinates, flag it as a successor CANDIDATE (with a confidence, never asserted). Event mode only.",
                        "default": false
                    },
                    "watchlistName": {
                        "title": "Watchlist name (accumulate history)",
                        "type": "string",
                        "description": "Name a watchlist to persist a snapshot each run and diff against it next time — this builds change history (including per-entity category timelines) beyond the 2 public releases. First run captures a baseline; changes are reported from the next run. Needs a full-access token (restricted tokens can't open named stores).",
                        "default": ""
                    },
                    "reviewDecisions": {
                        "title": "Review decisions (persisted dispositions)",
                        "type": "array",
                        "description": "Watchlist mode only. Remembered human dispositions: [{ \"gersId\": \"08f2…\", \"status\": \"confirmed\" }]. Echoed back on the matching change events next run, so an analyst's call survives re-runs."
                    },
                    "demandLayer": {
                        "title": "Demand layer (optional, for opportunity scoring)",
                        "type": "object",
                        "description": "Optional population/footfall layer. When supplied, the opportunity/whitespace score upgrades from supply-side-only to demand-aware. Leave empty for the honest supply-side score."
                    },
                    "notionConnector": {
                        "title": "Notion connector (optional)",
                        "type": "string",
                        "description": "Optional. Connect a Notion workspace to receive a one-page resolution/territory report (and review worklist) at the end of the run. Apify proxies your credentials — this actor never sees your Notion token. Create one in Apify Console → Settings → MCP connectors (Notion is one-click)."
                    },
                    "notionArchiveProfile": {
                        "title": "Notion archive profile",
                        "enum": [
                            "summary",
                            "per-review"
                        ],
                        "type": "string",
                        "description": "summary = one page per run (digest + top review items). per-review = one page per review-worklist item (capped at 50).",
                        "default": "summary"
                    },
                    "notionDatabaseId": {
                        "title": "Notion data source / database id (optional)",
                        "type": "string",
                        "description": "Optional. A Notion data source id to write pages into a specific database. Leave blank to create standalone workspace pages.",
                        "default": ""
                    },
                    "slackConnector": {
                        "title": "Slack connector (optional)",
                        "type": "string",
                        "description": "Optional. Connect Slack to post the resolution/territory digest plus high-priority review items to a channel. Apify proxies your credentials — this actor never sees your Slack token. (Slack connectors require you to register your own Slack OAuth app.)"
                    },
                    "slackChannel": {
                        "title": "Slack channel (optional)",
                        "type": "string",
                        "description": "Optional channel id/name to post into (e.g. \"#data-ops\"). If omitted, the connector's default channel is used.",
                        "default": ""
                    },
                    "slackMinReviewPriority": {
                        "title": "Slack — minimum review priority to post",
                        "minimum": 0,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Only review-worklist items at or above this priority (0-100) are posted to Slack, to keep the channel signal-only. Default 50. The digest is always posted.",
                        "default": 50
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
