# Data Bridge (`filip_cicvarek/data-bridge`) Actor

Turn messy data into clean records for HubSpot, Salesforce, Airtable, SQL, Google Sheets, or any custom schema. Just point it at your data and pick a target format. AI figures out which fields go where, normalizes emails or phone numbers, parses dates, removes duplicates, and validates the output.

- **URL**: https://apify.com/filip\_cicvarek/data-bridge.md
- **Developed by:** [Filip Cicvárek](https://apify.com/filip_cicvarek) (community)
- **Categories:** Automation, Integrations, Developer tools
- **Stats:** 2 total users, 1 monthly users, 0.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Data Bridge - Universal Data Transformer

Transform any dataset into any target schema using AI-powered field mapping. Feed it data from Apify scrapers, CSV files, or JSON APIs, and get it out in the exact format your CRM, database, or analytics tool expects.

### What it does

Data Bridge takes source data and a target schema, then:

1. **Analyzes** the source data structure automatically
2. **Maps** fields to the target schema (via AI or manual mapping)
3. **Cleans** every record: email normalization, phone formatting, date standardization, whitespace trimming
4. **Deduplicates** on any combination of fields
5. **Validates** against target schema constraints
6. **Outputs** a clean dataset + detailed reports

The AI is called **once** to figure out the mapping, then all records are transformed deterministically. This keeps cost under $0.01 per run regardless of dataset size.

### Two ways to map fields

#### Option A: Let the AI figure it out (recommended)

Provide an OpenAI API key and the AI will automatically match your source fields to the target schema. It matches by meaning, not just by name -- so `company_name` maps to `Company`, `email_address` maps to `email`, and a `full_name` field gets split into `firstname` and `lastname`.

Cost: ~$0.001 per run with GPT-4o Mini.

#### Option B: Map fields manually

If you don't want to use AI, provide a simple field mapping that tells the Actor which source fields go where:

```json
{
    "email_address": "email",
    "company_name": "company",
    "phone_number": "phone",
    "city": "city"
}
````

Left side = your source field name. Right side = target field name. That's it.

You can also combine both: use AI for most fields and add manual mappings to override specific ones.

### Quick start examples

#### Example 1: AI mapping to HubSpot (easiest)

```json
{
    "sourceType": "raw",
    "rawData": "[{\"full_name\": \"John Doe\", \"email_address\": \"JOHN.DOE@EXAMPLE.COM\", \"phone_number\": \"5551234567\", \"company_name\": \"Acme Corp\"}]",
    "targetSchemaType": "preset",
    "preset": "hubspot-contact",
    "deduplicationKeys": ["email"],
    "openaiApiKey": "sk-..."
}
```

**Output** (HubSpot-ready):

```json
{
    "firstname": "John",
    "lastname": "Doe",
    "email": "john.doe@example.com",
    "phone": "+15551234567",
    "company": "Acme Corp"
}
```

The AI figured out that `full_name` should be split into `firstname`/`lastname`, `email_address` maps to `email` (lowercased), and `phone_number` maps to `phone` (formatted to E.164).

#### Example 2: Manual field mapping (no AI key needed)

```json
{
    "sourceType": "raw",
    "rawData": "[{\"full_name\": \"John Doe\", \"email_address\": \"JOHN.DOE@EXAMPLE.COM\", \"phone_number\": \"5551234567\", \"company_name\": \"Acme Corp\"}]",
    "targetSchemaType": "preset",
    "preset": "hubspot-contact",
    "fieldMappings": {
        "email_address": "email",
        "phone_number": "phone",
        "company_name": "company",
        "full_name": "firstname"
    },
    "deduplicationKeys": ["email"]
}
```

Data cleaning (email lowercasing, phone formatting, whitespace trimming) happens automatically based on the target field type -- no extra configuration needed.

#### Example 3: Merge multiple scraper datasets into Salesforce leads

```json
{
    "sourceType": "dataset",
    "datasetIds": ["dataset-id-from-scraper-1", "dataset-id-from-scraper-2"],
    "targetSchemaType": "preset",
    "preset": "salesforce-lead",
    "deduplicationKeys": ["Email"],
    "openaiApiKey": "sk-..."
}
```

Records from both datasets are merged, mapped to Salesforce Lead fields, deduplicated by email, and validated.

#### Example 4: Match an example record (no preset needed)

If you don't use a standard platform, just paste one example of what you want the output to look like:

```json
{
    "sourceType": "dataset",
    "datasetIds": ["your-dataset-id"],
    "targetSchemaType": "example",
    "targetExample": {
        "contact_name": "Jane Doe",
        "contact_email": "jane@example.com",
        "phone": "+1-555-123-4567",
        "signup_date": "2026-01-15",
        "is_active": true
    },
    "openaiApiKey": "sk-..."
}
```

The Actor infers the target schema from your example and maps your source data to match it.

#### Example 5: Advanced transforms (split names, join fields)

For cases where you need specific transforms like splitting a full name:

```json
{
    "sourceType": "dataset",
    "datasetIds": ["your-dataset-id"],
    "targetSchemaType": "preset",
    "preset": "hubspot-contact",
    "fieldMappings": {
        "email_address": "email",
        "company_name": "company",
        "city": "city"
    },
    "transformationRules": [
        {"sourceField": "full_name", "targetField": "firstname", "transform": "split_name", "params": {"output": "first"}},
        {"sourceField": "full_name", "targetField": "lastname", "transform": "split_name", "params": {"output": "last"}}
    ]
}
```

`fieldMappings` handles the simple field-to-field mapping. `transformationRules` handles the cases that need special transforms. Both work together.

### Supported input sources

| Source | How to use |
|--------|-----------|
| **Apify Datasets** | Provide one or more Dataset IDs -- records from all datasets are merged before transformation |
| **Remote URL** | Point to a JSON, JSONL, or CSV file |
| **Paste JSON** | Paste a JSON array directly into the input |

### Target schema presets

| Preset | Output field examples |
|--------|----------------------|
| **HubSpot Contact** | `email`, `firstname`, `lastname`, `phone`, `company`, `city`, `state`, `zip`, `lifecyclestage` |
| **Salesforce Lead** | `Email`, `FirstName`, `LastName`, `Company`, `Phone`, `City`, `State`, `LeadSource` |
| **Airtable Row** | `Name`, `Email`, `Phone`, `URL`, `Tags`, `Date`, `Category`, `Checkbox` |
| **SQL INSERT** | `id`, `name`, `email`, `phone`, `category`, `is_active`, `created_at`, `metadata` |
| **Google Sheets Row** | `Name`, `Email`, `Phone`, `Address`, `Date`, `Tags`, `Notes` |
| **Custom JSON** | Keeps original field names -- just cleans and deduplicates the data |

You can also define your own schema manually or paste an example record.

### Automatic data cleaning

These run automatically on every mapped field based on the target field type. All are enabled by default and can be toggled off individually.

| What | Before | After |
|------|--------|-------|
| **Lowercase emails** | `JOHN@EXAMPLE.COM` | `john@example.com` |
| **Format phone numbers** | `(555) 123-4567` | `+15551234567` (E.164) |
| **Standardize dates** | `Jan 15, 2026` | `2026-01-15T00:00:00` (ISO 8601) |
| **Trim whitespace** | `  John   Doe  ` | `John Doe` |

Phone and date output formats are configurable (E.164 / national / international, ISO 8601 / YYYY-MM-DD / Unix timestamp, etc.).

### Advanced transforms

For cases that need more than simple field mapping and auto-cleaning, use `transformationRules`:

| Transform | What it does | Example |
|-----------|-------------|---------|
| `split_name` | Split `"John Doe"` into first/last | `{"output": "first"}` returns `"John"` |
| `join_fields` | Concatenate multiple values | `{"separator": ", "}` |
| `type_cast` | Convert between types | `{"target_type": "integer"}` |
| `boolean_parse` | Parse `"yes"/"no"`, `1/0` to boolean | |
| `date_normalize` | Parse + reformat dates | `{"target_format": "%Y-%m-%d"}` |
| `phone_format` | Format phone numbers | `{"format": "NATIONAL"}` |
| `email_lowercase` | Lowercase + trim emails | |
| `trim_whitespace` | Strip + collapse whitespace | |

### Output

Every run produces:

| Output | Where to find it | What's in it |
|--------|-------------------|-------------|
| **Transformed dataset** | Default Dataset | All records in the target schema |
| **Validation report** | Key-Value Store: `VALIDATION_REPORT` | Per-row errors, summary stats, rejected records |
| **Transformation log** | Key-Value Store: `TRANSFORMATION_LOG` | Field mapping details, dedup stats, processing time |
| **Field mappings** | Key-Value Store: `FIELD_MAPPINGS` | The mapping plan (save this to reuse without AI next time) |

Each output record includes metadata fields:

- `_bridgeRowIndex` -- sequential row number from source
- `_bridgeStatus` -- `"ok"`, `"warning"`, or `"error"`
- `_bridgeWarnings` -- array of warning messages (only if there are warnings)

### Input reference

#### 1. Source Data

| Field | Type | Description |
|-------|------|-------------|
| `sourceType` | select | `dataset`, `url`, or `raw` |
| `datasetIds` | string list | One or more Apify Dataset IDs |
| `sourceUrl` | string | URL to a JSON, JSONL, or CSV file |
| `rawData` | string | JSON array pasted as text |

#### 2. Target Format

| Field | Type | Description |
|-------|------|-------------|
| `targetSchemaType` | select | `preset`, `example`, or `manual` |
| `preset` | select | `hubspot-contact`, `salesforce-lead`, `airtable-row`, `sql-insert`, `google-sheets-row`, `custom-json` |
| `targetExample` | JSON | One example record in the desired output shape |
| `targetSchema` | JSON | Manual field definitions with types, formats, constraints |

#### 3. AI Field Mapping

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `openaiApiKey` | secret | -- | OpenAI API key for automatic mapping. Not needed if you map all fields manually. |
| `llmModel` | select | `gpt-4o-mini` | AI model to use |

#### 4. Manual Field Mapping

| Field | Type | Description |
|-------|------|-------------|
| `fieldMappings` | JSON object | `{"source_field": "target_field"}` pairs. Overrides AI suggestions. |

#### 5. Data Cleaning

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `normalizeEmails` | boolean | `true` | Lowercase + trim email fields |
| `formatPhones` | boolean | `true` | Standardize phone number fields |
| `normalizeDates` | boolean | `true` | Standardize date fields |
| `trimAllWhitespace` | boolean | `true` | Trim all string fields |
| `phoneFormat` | select | `E164` | Phone output format |
| `defaultCountryCode` | string | `US` | Default country for phone parsing |
| `dateFormat` | select | `ISO8601` | Date output format |

#### 6. Deduplication

| Field | Type | Description |
|-------|------|-------------|
| `deduplicationKeys` | string list | Target field names to deduplicate on (e.g., `email`). Leave empty to keep all records. |

#### 7. Validation

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `strictMode` | boolean | `false` | Drop invalid records instead of flagging them |
| `validationRules` | JSON | -- | Custom constraints: `{"field": {"minLength": 5, "pattern": "..."}}` |

#### 8. Advanced

| Field | Type | Description |
|-------|------|-------------|
| `transformationRules` | JSON array | `[{sourceField, targetField, transform, params}]` for advanced transforms like `split_name`. |
| `maxRows` | integer | Limit rows for testing (0 = all) |
| `batchSize` | integer | Records per batch (default: 100) |

### Local development

```bash
## Install dependencies
pip install -r requirements.txt

## Create test input
mkdir -p storage/key_value_stores/default
## Write your INPUT.json to storage/key_value_stores/default/INPUT.json

## Run locally
python -m my_actor

## Deploy to Apify
apify push
```

### Integration

Data Bridge works with Apify's integration ecosystem:

- **Actor chaining** -- Use the output Dataset ID as input to another Actor
- **Make / Zapier / n8n** -- Trigger Data Bridge after a scraper finishes, feed the transformed data into your CRM
- **MCP server** -- Use via Apify's MCP integration: "transform this dataset to match my HubSpot schema"
- **API** -- Call via the Apify API to transform data programmatically

# Actor input Schema

## `sourceType` (type: `string`):

Choose where to load the source data from. Pick 'Apify Dataset' if your data comes from another Actor run, 'URL' if you have a file hosted online, or 'Paste JSON' to enter data directly.

## `datasetIds` (type: `array`):

Paste one or more Dataset IDs from previous Actor runs. You can find the Dataset ID in the Actor run's Storage tab. If you provide multiple IDs, all records are combined into one dataset before transformation.

## `sourceUrl` (type: `string`):

Full URL to a JSON, JSONL, or CSV file.

## `rawData` (type: `string`):

Paste a JSON array of objects here.

## `targetSchemaType` (type: `string`):

Choose 'Platform Preset' to use a ready-made schema for popular tools like HubSpot or Salesforce. Choose 'Example Record' to paste one sample of what you want the output to look like. Choose 'Manual' if you want to define every field yourself.

## `preset` (type: `string`):

Pick your destination platform. The Actor will use that platform's standard field names so you can import directly. Choose 'Custom JSON' if you just want to clean/deduplicate your data without changing the field names.

## `targetExample` (type: `object`):

Paste one record that looks exactly like you want the output to look. The Actor will infer the field names and types from this example, then map your source data to match it.

## `targetSchema` (type: `object`):

Define each output field with its name, type, and whether it's required. See the example for the expected format.

## `openaiApiKey` (type: `string`):

The Actor uses AI to automatically figure out how your source fields map to the target fields (e.g., it knows that 'company\_name' should go into 'Company'). Paste your OpenAI API key here to enable this. You can get one at platform.openai.com/api-keys. Costs ~$0.001 per run. If you don't provide a key, you must map all fields manually in the section below.

## `llmModel` (type: `string`):

GPT-4o Mini works great for most cases and costs ~$0.001 per run. Use GPT-4o if your schemas are complex or field names are ambiguous.

## `fieldMappings` (type: `object`):

Tell the Actor exactly which source fields map to which target fields. The left side (key) is your source field name, the right side (value) is the target field name it should map to. You can combine this with AI mapping -- your manual mappings always take priority over AI suggestions.

## `normalizeEmails` (type: `boolean`):

Converts 'JOHN@EXAMPLE.COM' to 'john@example.com' and removes extra whitespace. Applied to all fields that map to an email-type target field.

## `formatPhones` (type: `boolean`):

Converts phone numbers like '(555) 123-4567' or '555.123.4567' into a consistent format. Choose the format below.

## `normalizeDates` (type: `boolean`):

Parses dates in any format ('Jan 15, 2026', '15/01/2026', '2026-01-15') and converts them to a consistent format. Choose the format below.

## `trimAllWhitespace` (type: `boolean`):

Removes leading/trailing spaces and fixes double spaces in all text fields. Turns '  John   Doe  ' into 'John Doe'.

## `phoneFormat` (type: `string`):

How should phone numbers look in the output?

## `defaultCountryCode` (type: `string`):

When a phone number doesn't include a country code (e.g., '5551234567'), which country should be assumed? Use a 2-letter code: US, GB, DE, FR, etc.

## `dateFormat` (type: `string`):

How should dates look in the output?

## `deduplicationKeys` (type: `array`):

Enter one or more target field names. Records with identical values in ALL listed fields are considered duplicates -- only the first occurrence is kept. For example, enter 'email' to remove rows with the same email address. Leave empty to keep all records.

## `strictMode` (type: `boolean`):

When enabled, records that fail validation (missing required fields, wrong types) are removed from the output. When disabled (default), invalid records stay in the output but are marked with \_bridgeStatus: 'error' so you can filter them later.

## `validationRules` (type: `object`):

Add extra constraints beyond the target schema defaults. Keys are target field names, values are constraint objects. Supported constraints: minLength, maxLength, pattern (regex), min, max, enum (list of allowed values).

## `transformationRules` (type: `array`):

For advanced users who need full control over individual field transformations. Each rule specifies a source field, target field, transform function, and optional parameters. These override both AI and manual field mappings for the specified target field.

## `maxRows` (type: `integer`):

Process only the first N rows. Useful for testing with a small sample before running on the full dataset. Set to 0 to process everything.

## `batchSize` (type: `integer`):

How many records to process at a time. The default of 100 works well for most cases. Increase for faster processing on large datasets, decrease if you run into memory issues.

## Actor input object example

```json
{
  "sourceType": "dataset",
  "datasetIds": [
    "aKr4wB5de9gDkQMol",
    "xPq8mN2vk7hRtYjCp"
  ],
  "sourceUrl": "https://example.com/data/contacts.json",
  "rawData": "[{\"name\": \"John Doe\", \"email\": \"john@example.com\", \"phone\": \"5551234567\"}, {\"name\": \"Jane Smith\", \"email\": \"jane@test.org\", \"phone\": \"+1-555-987-6543\"}]",
  "targetSchemaType": "preset",
  "preset": "hubspot-contact",
  "targetExample": {
    "full_name": "Jane Doe",
    "email": "jane@example.com",
    "phone": "+1 (555) 123-4567",
    "company": "Acme Corp",
    "signup_date": "2026-01-15",
    "is_active": true
  },
  "targetSchema": {
    "fields": {
      "email": {
        "type": "string",
        "format": "email",
        "required": true,
        "description": "Contact email"
      },
      "full_name": {
        "type": "string",
        "required": true,
        "description": "Full name"
      },
      "phone": {
        "type": "string",
        "description": "Phone number"
      },
      "signup_date": {
        "type": "date",
        "description": "When they signed up"
      },
      "is_vip": {
        "type": "boolean",
        "description": "VIP customer flag"
      }
    }
  },
  "openaiApiKey": "sk-proj-abc123...",
  "llmModel": "gpt-4o-mini",
  "fieldMappings": {
    "email_address": "email",
    "full_name": "firstname",
    "company_name": "company",
    "phone_number": "phone",
    "city": "city"
  },
  "normalizeEmails": true,
  "formatPhones": true,
  "normalizeDates": true,
  "trimAllWhitespace": true,
  "phoneFormat": "E164",
  "defaultCountryCode": "US",
  "dateFormat": "ISO8601",
  "deduplicationKeys": [
    "email"
  ],
  "strictMode": false,
  "validationRules": {
    "email": {
      "pattern": "^[^@]+@[^@]+\\.[^@]+$"
    },
    "country": {
      "enum": [
        "US",
        "GB",
        "DE",
        "FR",
        "CA"
      ]
    },
    "age": {
      "min": 0,
      "max": 150
    }
  },
  "transformationRules": [
    {
      "sourceField": "full_name",
      "targetField": "firstname",
      "transform": "split_name",
      "params": {
        "output": "first"
      }
    },
    {
      "sourceField": "full_name",
      "targetField": "lastname",
      "transform": "split_name",
      "params": {
        "output": "last"
      }
    }
  ],
  "maxRows": 1000,
  "batchSize": 100
}
```

# Actor output Schema

## `transformedDataset` (type: `string`):

No description

## `validationReport` (type: `string`):

No description

## `transformationLog` (type: `string`):

No description

## `fieldMappings` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "rawData": "[{\"name\": \"John Doe\", \"email\": \"john@example.com\", \"phone\": \"5551234567\"}, {\"name\": \"Jane Smith\", \"email\": \"jane@test.org\", \"phone\": \"+1-555-987-6543\"}]",
    "targetExample": {
        "full_name": "Jane Doe",
        "email": "jane@example.com",
        "phone": "+1 (555) 123-4567",
        "company": "Acme Corp",
        "signup_date": "2026-01-15",
        "is_active": true
    },
    "targetSchema": {
        "fields": {
            "email": {
                "type": "string",
                "format": "email",
                "required": true,
                "description": "Contact email"
            },
            "full_name": {
                "type": "string",
                "required": true,
                "description": "Full name"
            },
            "phone": {
                "type": "string",
                "description": "Phone number"
            },
            "signup_date": {
                "type": "date",
                "description": "When they signed up"
            },
            "is_vip": {
                "type": "boolean",
                "description": "VIP customer flag"
            }
        }
    },
    "fieldMappings": {
        "email_address": "email",
        "full_name": "firstname",
        "company_name": "company",
        "phone_number": "phone",
        "city": "city"
    },
    "validationRules": {
        "email": {
            "pattern": "^[^@]+@[^@]+\\.[^@]+$"
        },
        "country": {
            "enum": [
                "US",
                "GB",
                "DE",
                "FR",
                "CA"
            ]
        },
        "age": {
            "min": 0,
            "max": 150
        }
    },
    "transformationRules": [
        {
            "sourceField": "full_name",
            "targetField": "firstname",
            "transform": "split_name",
            "params": {
                "output": "first"
            }
        },
        {
            "sourceField": "full_name",
            "targetField": "lastname",
            "transform": "split_name",
            "params": {
                "output": "last"
            }
        },
        {
            "sourceField": "signup_date",
            "targetField": "created_at",
            "transform": "date_normalize",
            "params": {
                "target_format": "%Y-%m-%d"
            }
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("filip_cicvarek/data-bridge").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "rawData": "[{\"name\": \"John Doe\", \"email\": \"john@example.com\", \"phone\": \"5551234567\"}, {\"name\": \"Jane Smith\", \"email\": \"jane@test.org\", \"phone\": \"+1-555-987-6543\"}]",
    "targetExample": {
        "full_name": "Jane Doe",
        "email": "jane@example.com",
        "phone": "+1 (555) 123-4567",
        "company": "Acme Corp",
        "signup_date": "2026-01-15",
        "is_active": True,
    },
    "targetSchema": { "fields": {
            "email": {
                "type": "string",
                "format": "email",
                "required": True,
                "description": "Contact email",
            },
            "full_name": {
                "type": "string",
                "required": True,
                "description": "Full name",
            },
            "phone": {
                "type": "string",
                "description": "Phone number",
            },
            "signup_date": {
                "type": "date",
                "description": "When they signed up",
            },
            "is_vip": {
                "type": "boolean",
                "description": "VIP customer flag",
            },
        } },
    "fieldMappings": {
        "email_address": "email",
        "full_name": "firstname",
        "company_name": "company",
        "phone_number": "phone",
        "city": "city",
    },
    "validationRules": {
        "email": { "pattern": "^[^@]+@[^@]+\\.[^@]+$" },
        "country": { "enum": [
                "US",
                "GB",
                "DE",
                "FR",
                "CA",
            ] },
        "age": {
            "min": 0,
            "max": 150,
        },
    },
    "transformationRules": [
        {
            "sourceField": "full_name",
            "targetField": "firstname",
            "transform": "split_name",
            "params": { "output": "first" },
        },
        {
            "sourceField": "full_name",
            "targetField": "lastname",
            "transform": "split_name",
            "params": { "output": "last" },
        },
        {
            "sourceField": "signup_date",
            "targetField": "created_at",
            "transform": "date_normalize",
            "params": { "target_format": "%Y-%m-%d" },
        },
    ],
}

# Run the Actor and wait for it to finish
run = client.actor("filip_cicvarek/data-bridge").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "rawData": "[{\\"name\\": \\"John Doe\\", \\"email\\": \\"john@example.com\\", \\"phone\\": \\"5551234567\\"}, {\\"name\\": \\"Jane Smith\\", \\"email\\": \\"jane@test.org\\", \\"phone\\": \\"+1-555-987-6543\\"}]",
  "targetExample": {
    "full_name": "Jane Doe",
    "email": "jane@example.com",
    "phone": "+1 (555) 123-4567",
    "company": "Acme Corp",
    "signup_date": "2026-01-15",
    "is_active": true
  },
  "targetSchema": {
    "fields": {
      "email": {
        "type": "string",
        "format": "email",
        "required": true,
        "description": "Contact email"
      },
      "full_name": {
        "type": "string",
        "required": true,
        "description": "Full name"
      },
      "phone": {
        "type": "string",
        "description": "Phone number"
      },
      "signup_date": {
        "type": "date",
        "description": "When they signed up"
      },
      "is_vip": {
        "type": "boolean",
        "description": "VIP customer flag"
      }
    }
  },
  "fieldMappings": {
    "email_address": "email",
    "full_name": "firstname",
    "company_name": "company",
    "phone_number": "phone",
    "city": "city"
  },
  "validationRules": {
    "email": {
      "pattern": "^[^@]+@[^@]+\\\\.[^@]+$"
    },
    "country": {
      "enum": [
        "US",
        "GB",
        "DE",
        "FR",
        "CA"
      ]
    },
    "age": {
      "min": 0,
      "max": 150
    }
  },
  "transformationRules": [
    {
      "sourceField": "full_name",
      "targetField": "firstname",
      "transform": "split_name",
      "params": {
        "output": "first"
      }
    },
    {
      "sourceField": "full_name",
      "targetField": "lastname",
      "transform": "split_name",
      "params": {
        "output": "last"
      }
    },
    {
      "sourceField": "signup_date",
      "targetField": "created_at",
      "transform": "date_normalize",
      "params": {
        "target_format": "%Y-%m-%d"
      }
    }
  ]
}' |
apify call filip_cicvarek/data-bridge --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=filip_cicvarek/data-bridge",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Data Bridge",
        "description": "Turn messy data into clean records for HubSpot, Salesforce, Airtable, SQL, Google Sheets, or any custom schema. Just point it at your data and pick a target format. AI figures out which fields go where, normalizes emails or phone numbers, parses dates, removes duplicates, and validates the output.",
        "version": "0.0",
        "x-build-id": "BsOKT2wBxu3oFPjaX"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/filip_cicvarek~data-bridge/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-filip_cicvarek-data-bridge",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/filip_cicvarek~data-bridge/runs": {
            "post": {
                "operationId": "runs-sync-filip_cicvarek-data-bridge",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/filip_cicvarek~data-bridge/run-sync": {
            "post": {
                "operationId": "run-sync-filip_cicvarek-data-bridge",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "sourceType",
                    "targetSchemaType"
                ],
                "properties": {
                    "sourceType": {
                        "title": "Where is your data?",
                        "enum": [
                            "dataset",
                            "url",
                            "raw"
                        ],
                        "type": "string",
                        "description": "Choose where to load the source data from. Pick 'Apify Dataset' if your data comes from another Actor run, 'URL' if you have a file hosted online, or 'Paste JSON' to enter data directly.",
                        "default": "dataset"
                    },
                    "datasetIds": {
                        "title": "Dataset IDs",
                        "type": "array",
                        "description": "Paste one or more Dataset IDs from previous Actor runs. You can find the Dataset ID in the Actor run's Storage tab. If you provide multiple IDs, all records are combined into one dataset before transformation.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "sourceUrl": {
                        "title": "File URL",
                        "type": "string",
                        "description": "Full URL to a JSON, JSONL, or CSV file."
                    },
                    "rawData": {
                        "title": "Paste your JSON data",
                        "type": "string",
                        "description": "Paste a JSON array of objects here."
                    },
                    "targetSchemaType": {
                        "title": "What format do you need the output in?",
                        "enum": [
                            "preset",
                            "example",
                            "manual"
                        ],
                        "type": "string",
                        "description": "Choose 'Platform Preset' to use a ready-made schema for popular tools like HubSpot or Salesforce. Choose 'Example Record' to paste one sample of what you want the output to look like. Choose 'Manual' if you want to define every field yourself.",
                        "default": "preset"
                    },
                    "preset": {
                        "title": "Platform preset",
                        "enum": [
                            "hubspot-contact",
                            "salesforce-lead",
                            "airtable-row",
                            "sql-insert",
                            "google-sheets-row",
                            "custom-json"
                        ],
                        "type": "string",
                        "description": "Pick your destination platform. The Actor will use that platform's standard field names so you can import directly. Choose 'Custom JSON' if you just want to clean/deduplicate your data without changing the field names.",
                        "default": "hubspot-contact"
                    },
                    "targetExample": {
                        "title": "Example output record",
                        "type": "object",
                        "description": "Paste one record that looks exactly like you want the output to look. The Actor will infer the field names and types from this example, then map your source data to match it."
                    },
                    "targetSchema": {
                        "title": "Manual field definitions",
                        "type": "object",
                        "description": "Define each output field with its name, type, and whether it's required. See the example for the expected format."
                    },
                    "openaiApiKey": {
                        "title": "OpenAI API Key",
                        "type": "string",
                        "description": "The Actor uses AI to automatically figure out how your source fields map to the target fields (e.g., it knows that 'company_name' should go into 'Company'). Paste your OpenAI API key here to enable this. You can get one at platform.openai.com/api-keys. Costs ~$0.001 per run. If you don't provide a key, you must map all fields manually in the section below."
                    },
                    "llmModel": {
                        "title": "AI model",
                        "enum": [
                            "gpt-4o-mini",
                            "gpt-4o",
                            "gpt-4.1-mini",
                            "gpt-4.1-nano"
                        ],
                        "type": "string",
                        "description": "GPT-4o Mini works great for most cases and costs ~$0.001 per run. Use GPT-4o if your schemas are complex or field names are ambiguous.",
                        "default": "gpt-4o-mini"
                    },
                    "fieldMappings": {
                        "title": "Manual field mapping",
                        "type": "object",
                        "description": "Tell the Actor exactly which source fields map to which target fields. The left side (key) is your source field name, the right side (value) is the target field name it should map to. You can combine this with AI mapping -- your manual mappings always take priority over AI suggestions."
                    },
                    "normalizeEmails": {
                        "title": "Lowercase all emails",
                        "type": "boolean",
                        "description": "Converts 'JOHN@EXAMPLE.COM' to 'john@example.com' and removes extra whitespace. Applied to all fields that map to an email-type target field.",
                        "default": true
                    },
                    "formatPhones": {
                        "title": "Standardize phone numbers",
                        "type": "boolean",
                        "description": "Converts phone numbers like '(555) 123-4567' or '555.123.4567' into a consistent format. Choose the format below.",
                        "default": true
                    },
                    "normalizeDates": {
                        "title": "Standardize dates",
                        "type": "boolean",
                        "description": "Parses dates in any format ('Jan 15, 2026', '15/01/2026', '2026-01-15') and converts them to a consistent format. Choose the format below.",
                        "default": true
                    },
                    "trimAllWhitespace": {
                        "title": "Clean up whitespace",
                        "type": "boolean",
                        "description": "Removes leading/trailing spaces and fixes double spaces in all text fields. Turns '  John   Doe  ' into 'John Doe'.",
                        "default": true
                    },
                    "phoneFormat": {
                        "title": "Phone number format",
                        "enum": [
                            "E164",
                            "NATIONAL",
                            "INTERNATIONAL",
                            "RAW"
                        ],
                        "type": "string",
                        "description": "How should phone numbers look in the output?",
                        "default": "E164"
                    },
                    "defaultCountryCode": {
                        "title": "Default country for phone numbers",
                        "type": "string",
                        "description": "When a phone number doesn't include a country code (e.g., '5551234567'), which country should be assumed? Use a 2-letter code: US, GB, DE, FR, etc.",
                        "default": "US"
                    },
                    "dateFormat": {
                        "title": "Date format",
                        "enum": [
                            "ISO8601",
                            "%Y-%m-%d",
                            "%m/%d/%Y",
                            "%d/%m/%Y",
                            "%d.%m.%Y",
                            "%B %d, %Y",
                            "UNIX_TIMESTAMP"
                        ],
                        "type": "string",
                        "description": "How should dates look in the output?",
                        "default": "ISO8601"
                    },
                    "deduplicationKeys": {
                        "title": "Remove duplicates based on",
                        "type": "array",
                        "description": "Enter one or more target field names. Records with identical values in ALL listed fields are considered duplicates -- only the first occurrence is kept. For example, enter 'email' to remove rows with the same email address. Leave empty to keep all records.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "strictMode": {
                        "title": "Drop invalid records",
                        "type": "boolean",
                        "description": "When enabled, records that fail validation (missing required fields, wrong types) are removed from the output. When disabled (default), invalid records stay in the output but are marked with _bridgeStatus: 'error' so you can filter them later.",
                        "default": false
                    },
                    "validationRules": {
                        "title": "Custom validation rules",
                        "type": "object",
                        "description": "Add extra constraints beyond the target schema defaults. Keys are target field names, values are constraint objects. Supported constraints: minLength, maxLength, pattern (regex), min, max, enum (list of allowed values)."
                    },
                    "transformationRules": {
                        "title": "Custom transformation rules",
                        "type": "array",
                        "description": "For advanced users who need full control over individual field transformations. Each rule specifies a source field, target field, transform function, and optional parameters. These override both AI and manual field mappings for the specified target field.",
                        "default": []
                    },
                    "maxRows": {
                        "title": "Max rows to process",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Process only the first N rows. Useful for testing with a small sample before running on the full dataset. Set to 0 to process everything.",
                        "default": 0
                    },
                    "batchSize": {
                        "title": "Batch size",
                        "minimum": 10,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "How many records to process at a time. The default of 100 works well for most cases. Increase for faster processing on large datasets, decrease if you run into memory issues.",
                        "default": 100
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```