Data Bridge avatar

Data Bridge

Under maintenance

Pricing

Pay per usage

Go to Apify Store
Data Bridge

Data Bridge

Under maintenance

Turn messy data into clean records for HubSpot, Salesforce, Airtable, SQL, Google Sheets, or any custom schema. Just point it at your data and pick a target format. AI figures out which fields go where, normalizes emails or phone numbers, parses dates, removes duplicates, and validates the output.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Filip Cicvárek

Filip Cicvárek

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Data Bridge - Universal Data Transformer

Transform any dataset into any target schema using AI-powered field mapping. Feed it data from Apify scrapers, CSV files, or JSON APIs, and get it out in the exact format your CRM, database, or analytics tool expects.

What it does

Data Bridge takes source data and a target schema, then:

  1. Analyzes the source data structure automatically
  2. Maps fields to the target schema (via AI or manual mapping)
  3. Cleans every record: email normalization, phone formatting, date standardization, whitespace trimming
  4. Deduplicates on any combination of fields
  5. Validates against target schema constraints
  6. Outputs a clean dataset + detailed reports

The AI is called once to figure out the mapping, then all records are transformed deterministically. This keeps cost under $0.01 per run regardless of dataset size.

Two ways to map fields

Provide an OpenAI API key and the AI will automatically match your source fields to the target schema. It matches by meaning, not just by name -- so company_name maps to Company, email_address maps to email, and a full_name field gets split into firstname and lastname.

Cost: ~$0.001 per run with GPT-4o Mini.

Option B: Map fields manually

If you don't want to use AI, provide a simple field mapping that tells the Actor which source fields go where:

{
"email_address": "email",
"company_name": "company",
"phone_number": "phone",
"city": "city"
}

Left side = your source field name. Right side = target field name. That's it.

You can also combine both: use AI for most fields and add manual mappings to override specific ones.

Quick start examples

Example 1: AI mapping to HubSpot (easiest)

{
"sourceType": "raw",
"rawData": "[{\"full_name\": \"John Doe\", \"email_address\": \"JOHN.DOE@EXAMPLE.COM\", \"phone_number\": \"5551234567\", \"company_name\": \"Acme Corp\"}]",
"targetSchemaType": "preset",
"preset": "hubspot-contact",
"deduplicationKeys": ["email"],
"openaiApiKey": "sk-..."
}

Output (HubSpot-ready):

{
"firstname": "John",
"lastname": "Doe",
"email": "john.doe@example.com",
"phone": "+15551234567",
"company": "Acme Corp"
}

The AI figured out that full_name should be split into firstname/lastname, email_address maps to email (lowercased), and phone_number maps to phone (formatted to E.164).

Example 2: Manual field mapping (no AI key needed)

{
"sourceType": "raw",
"rawData": "[{\"full_name\": \"John Doe\", \"email_address\": \"JOHN.DOE@EXAMPLE.COM\", \"phone_number\": \"5551234567\", \"company_name\": \"Acme Corp\"}]",
"targetSchemaType": "preset",
"preset": "hubspot-contact",
"fieldMappings": {
"email_address": "email",
"phone_number": "phone",
"company_name": "company",
"full_name": "firstname"
},
"deduplicationKeys": ["email"]
}

Data cleaning (email lowercasing, phone formatting, whitespace trimming) happens automatically based on the target field type -- no extra configuration needed.

Example 3: Merge multiple scraper datasets into Salesforce leads

{
"sourceType": "dataset",
"datasetIds": ["dataset-id-from-scraper-1", "dataset-id-from-scraper-2"],
"targetSchemaType": "preset",
"preset": "salesforce-lead",
"deduplicationKeys": ["Email"],
"openaiApiKey": "sk-..."
}

Records from both datasets are merged, mapped to Salesforce Lead fields, deduplicated by email, and validated.

Example 4: Match an example record (no preset needed)

If you don't use a standard platform, just paste one example of what you want the output to look like:

{
"sourceType": "dataset",
"datasetIds": ["your-dataset-id"],
"targetSchemaType": "example",
"targetExample": {
"contact_name": "Jane Doe",
"contact_email": "jane@example.com",
"phone": "+1-555-123-4567",
"signup_date": "2026-01-15",
"is_active": true
},
"openaiApiKey": "sk-..."
}

The Actor infers the target schema from your example and maps your source data to match it.

Example 5: Advanced transforms (split names, join fields)

For cases where you need specific transforms like splitting a full name:

{
"sourceType": "dataset",
"datasetIds": ["your-dataset-id"],
"targetSchemaType": "preset",
"preset": "hubspot-contact",
"fieldMappings": {
"email_address": "email",
"company_name": "company",
"city": "city"
},
"transformationRules": [
{"sourceField": "full_name", "targetField": "firstname", "transform": "split_name", "params": {"output": "first"}},
{"sourceField": "full_name", "targetField": "lastname", "transform": "split_name", "params": {"output": "last"}}
]
}

fieldMappings handles the simple field-to-field mapping. transformationRules handles the cases that need special transforms. Both work together.

Supported input sources

SourceHow to use
Apify DatasetsProvide one or more Dataset IDs -- records from all datasets are merged before transformation
Remote URLPoint to a JSON, JSONL, or CSV file
Paste JSONPaste a JSON array directly into the input

Target schema presets

PresetOutput field examples
HubSpot Contactemail, firstname, lastname, phone, company, city, state, zip, lifecyclestage
Salesforce LeadEmail, FirstName, LastName, Company, Phone, City, State, LeadSource
Airtable RowName, Email, Phone, URL, Tags, Date, Category, Checkbox
SQL INSERTid, name, email, phone, category, is_active, created_at, metadata
Google Sheets RowName, Email, Phone, Address, Date, Tags, Notes
Custom JSONKeeps original field names -- just cleans and deduplicates the data

You can also define your own schema manually or paste an example record.

Automatic data cleaning

These run automatically on every mapped field based on the target field type. All are enabled by default and can be toggled off individually.

WhatBeforeAfter
Lowercase emailsJOHN@EXAMPLE.COMjohn@example.com
Format phone numbers(555) 123-4567+15551234567 (E.164)
Standardize datesJan 15, 20262026-01-15T00:00:00 (ISO 8601)
Trim whitespace John Doe John Doe

Phone and date output formats are configurable (E.164 / national / international, ISO 8601 / YYYY-MM-DD / Unix timestamp, etc.).

Advanced transforms

For cases that need more than simple field mapping and auto-cleaning, use transformationRules:

TransformWhat it doesExample
split_nameSplit "John Doe" into first/last{"output": "first"} returns "John"
join_fieldsConcatenate multiple values{"separator": ", "}
type_castConvert between types{"target_type": "integer"}
boolean_parseParse "yes"/"no", 1/0 to boolean
date_normalizeParse + reformat dates{"target_format": "%Y-%m-%d"}
phone_formatFormat phone numbers{"format": "NATIONAL"}
email_lowercaseLowercase + trim emails
trim_whitespaceStrip + collapse whitespace

Output

Every run produces:

OutputWhere to find itWhat's in it
Transformed datasetDefault DatasetAll records in the target schema
Validation reportKey-Value Store: VALIDATION_REPORTPer-row errors, summary stats, rejected records
Transformation logKey-Value Store: TRANSFORMATION_LOGField mapping details, dedup stats, processing time
Field mappingsKey-Value Store: FIELD_MAPPINGSThe mapping plan (save this to reuse without AI next time)

Each output record includes metadata fields:

  • _bridgeRowIndex -- sequential row number from source
  • _bridgeStatus -- "ok", "warning", or "error"
  • _bridgeWarnings -- array of warning messages (only if there are warnings)

Input reference

1. Source Data

FieldTypeDescription
sourceTypeselectdataset, url, or raw
datasetIdsstring listOne or more Apify Dataset IDs
sourceUrlstringURL to a JSON, JSONL, or CSV file
rawDatastringJSON array pasted as text

2. Target Format

FieldTypeDescription
targetSchemaTypeselectpreset, example, or manual
presetselecthubspot-contact, salesforce-lead, airtable-row, sql-insert, google-sheets-row, custom-json
targetExampleJSONOne example record in the desired output shape
targetSchemaJSONManual field definitions with types, formats, constraints

3. AI Field Mapping

FieldTypeDefaultDescription
openaiApiKeysecret--OpenAI API key for automatic mapping. Not needed if you map all fields manually.
llmModelselectgpt-4o-miniAI model to use

4. Manual Field Mapping

FieldTypeDescription
fieldMappingsJSON object{"source_field": "target_field"} pairs. Overrides AI suggestions.

5. Data Cleaning

FieldTypeDefaultDescription
normalizeEmailsbooleantrueLowercase + trim email fields
formatPhonesbooleantrueStandardize phone number fields
normalizeDatesbooleantrueStandardize date fields
trimAllWhitespacebooleantrueTrim all string fields
phoneFormatselectE164Phone output format
defaultCountryCodestringUSDefault country for phone parsing
dateFormatselectISO8601Date output format

6. Deduplication

FieldTypeDescription
deduplicationKeysstring listTarget field names to deduplicate on (e.g., email). Leave empty to keep all records.

7. Validation

FieldTypeDefaultDescription
strictModebooleanfalseDrop invalid records instead of flagging them
validationRulesJSON--Custom constraints: {"field": {"minLength": 5, "pattern": "..."}}

8. Advanced

FieldTypeDescription
transformationRulesJSON array[{sourceField, targetField, transform, params}] for advanced transforms like split_name.
maxRowsintegerLimit rows for testing (0 = all)
batchSizeintegerRecords per batch (default: 100)

Local development

# Install dependencies
pip install -r requirements.txt
# Create test input
mkdir -p storage/key_value_stores/default
# Write your INPUT.json to storage/key_value_stores/default/INPUT.json
# Run locally
python -m my_actor
# Deploy to Apify
apify push

Integration

Data Bridge works with Apify's integration ecosystem:

  • Actor chaining -- Use the output Dataset ID as input to another Actor
  • Make / Zapier / n8n -- Trigger Data Bridge after a scraper finishes, feed the transformed data into your CRM
  • MCP server -- Use via Apify's MCP integration: "transform this dataset to match my HubSpot schema"
  • API -- Call via the Apify API to transform data programmatically