Data Bridge
Pricing
Pay per usage
Data Bridge
Turn messy data into clean records for HubSpot, Salesforce, Airtable, SQL, Google Sheets, or any custom schema. Just point it at your data and pick a target format. AI figures out which fields go where, normalizes emails or phone numbers, parses dates, removes duplicates, and validates the output.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Filip Cicvárek
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Data Bridge - Universal Data Transformer
Transform any dataset into any target schema using AI-powered field mapping. Feed it data from Apify scrapers, CSV files, or JSON APIs, and get it out in the exact format your CRM, database, or analytics tool expects.
What it does
Data Bridge takes source data and a target schema, then:
- Analyzes the source data structure automatically
- Maps fields to the target schema (via AI or manual mapping)
- Cleans every record: email normalization, phone formatting, date standardization, whitespace trimming
- Deduplicates on any combination of fields
- Validates against target schema constraints
- Outputs a clean dataset + detailed reports
The AI is called once to figure out the mapping, then all records are transformed deterministically. This keeps cost under $0.01 per run regardless of dataset size.
Two ways to map fields
Option A: Let the AI figure it out (recommended)
Provide an OpenAI API key and the AI will automatically match your source fields to the target schema. It matches by meaning, not just by name -- so company_name maps to Company, email_address maps to email, and a full_name field gets split into firstname and lastname.
Cost: ~$0.001 per run with GPT-4o Mini.
Option B: Map fields manually
If you don't want to use AI, provide a simple field mapping that tells the Actor which source fields go where:
{"email_address": "email","company_name": "company","phone_number": "phone","city": "city"}
Left side = your source field name. Right side = target field name. That's it.
You can also combine both: use AI for most fields and add manual mappings to override specific ones.
Quick start examples
Example 1: AI mapping to HubSpot (easiest)
{"sourceType": "raw","rawData": "[{\"full_name\": \"John Doe\", \"email_address\": \"JOHN.DOE@EXAMPLE.COM\", \"phone_number\": \"5551234567\", \"company_name\": \"Acme Corp\"}]","targetSchemaType": "preset","preset": "hubspot-contact","deduplicationKeys": ["email"],"openaiApiKey": "sk-..."}
Output (HubSpot-ready):
{"firstname": "John","lastname": "Doe","email": "john.doe@example.com","phone": "+15551234567","company": "Acme Corp"}
The AI figured out that full_name should be split into firstname/lastname, email_address maps to email (lowercased), and phone_number maps to phone (formatted to E.164).
Example 2: Manual field mapping (no AI key needed)
{"sourceType": "raw","rawData": "[{\"full_name\": \"John Doe\", \"email_address\": \"JOHN.DOE@EXAMPLE.COM\", \"phone_number\": \"5551234567\", \"company_name\": \"Acme Corp\"}]","targetSchemaType": "preset","preset": "hubspot-contact","fieldMappings": {"email_address": "email","phone_number": "phone","company_name": "company","full_name": "firstname"},"deduplicationKeys": ["email"]}
Data cleaning (email lowercasing, phone formatting, whitespace trimming) happens automatically based on the target field type -- no extra configuration needed.
Example 3: Merge multiple scraper datasets into Salesforce leads
{"sourceType": "dataset","datasetIds": ["dataset-id-from-scraper-1", "dataset-id-from-scraper-2"],"targetSchemaType": "preset","preset": "salesforce-lead","deduplicationKeys": ["Email"],"openaiApiKey": "sk-..."}
Records from both datasets are merged, mapped to Salesforce Lead fields, deduplicated by email, and validated.
Example 4: Match an example record (no preset needed)
If you don't use a standard platform, just paste one example of what you want the output to look like:
{"sourceType": "dataset","datasetIds": ["your-dataset-id"],"targetSchemaType": "example","targetExample": {"contact_name": "Jane Doe","contact_email": "jane@example.com","phone": "+1-555-123-4567","signup_date": "2026-01-15","is_active": true},"openaiApiKey": "sk-..."}
The Actor infers the target schema from your example and maps your source data to match it.
Example 5: Advanced transforms (split names, join fields)
For cases where you need specific transforms like splitting a full name:
{"sourceType": "dataset","datasetIds": ["your-dataset-id"],"targetSchemaType": "preset","preset": "hubspot-contact","fieldMappings": {"email_address": "email","company_name": "company","city": "city"},"transformationRules": [{"sourceField": "full_name", "targetField": "firstname", "transform": "split_name", "params": {"output": "first"}},{"sourceField": "full_name", "targetField": "lastname", "transform": "split_name", "params": {"output": "last"}}]}
fieldMappings handles the simple field-to-field mapping. transformationRules handles the cases that need special transforms. Both work together.
Supported input sources
| Source | How to use |
|---|---|
| Apify Datasets | Provide one or more Dataset IDs -- records from all datasets are merged before transformation |
| Remote URL | Point to a JSON, JSONL, or CSV file |
| Paste JSON | Paste a JSON array directly into the input |
Target schema presets
| Preset | Output field examples |
|---|---|
| HubSpot Contact | email, firstname, lastname, phone, company, city, state, zip, lifecyclestage |
| Salesforce Lead | Email, FirstName, LastName, Company, Phone, City, State, LeadSource |
| Airtable Row | Name, Email, Phone, URL, Tags, Date, Category, Checkbox |
| SQL INSERT | id, name, email, phone, category, is_active, created_at, metadata |
| Google Sheets Row | Name, Email, Phone, Address, Date, Tags, Notes |
| Custom JSON | Keeps original field names -- just cleans and deduplicates the data |
You can also define your own schema manually or paste an example record.
Automatic data cleaning
These run automatically on every mapped field based on the target field type. All are enabled by default and can be toggled off individually.
| What | Before | After |
|---|---|---|
| Lowercase emails | JOHN@EXAMPLE.COM | john@example.com |
| Format phone numbers | (555) 123-4567 | +15551234567 (E.164) |
| Standardize dates | Jan 15, 2026 | 2026-01-15T00:00:00 (ISO 8601) |
| Trim whitespace | John Doe | John Doe |
Phone and date output formats are configurable (E.164 / national / international, ISO 8601 / YYYY-MM-DD / Unix timestamp, etc.).
Advanced transforms
For cases that need more than simple field mapping and auto-cleaning, use transformationRules:
| Transform | What it does | Example |
|---|---|---|
split_name | Split "John Doe" into first/last | {"output": "first"} returns "John" |
join_fields | Concatenate multiple values | {"separator": ", "} |
type_cast | Convert between types | {"target_type": "integer"} |
boolean_parse | Parse "yes"/"no", 1/0 to boolean | |
date_normalize | Parse + reformat dates | {"target_format": "%Y-%m-%d"} |
phone_format | Format phone numbers | {"format": "NATIONAL"} |
email_lowercase | Lowercase + trim emails | |
trim_whitespace | Strip + collapse whitespace |
Output
Every run produces:
| Output | Where to find it | What's in it |
|---|---|---|
| Transformed dataset | Default Dataset | All records in the target schema |
| Validation report | Key-Value Store: VALIDATION_REPORT | Per-row errors, summary stats, rejected records |
| Transformation log | Key-Value Store: TRANSFORMATION_LOG | Field mapping details, dedup stats, processing time |
| Field mappings | Key-Value Store: FIELD_MAPPINGS | The mapping plan (save this to reuse without AI next time) |
Each output record includes metadata fields:
_bridgeRowIndex-- sequential row number from source_bridgeStatus--"ok","warning", or"error"_bridgeWarnings-- array of warning messages (only if there are warnings)
Input reference
1. Source Data
| Field | Type | Description |
|---|---|---|
sourceType | select | dataset, url, or raw |
datasetIds | string list | One or more Apify Dataset IDs |
sourceUrl | string | URL to a JSON, JSONL, or CSV file |
rawData | string | JSON array pasted as text |
2. Target Format
| Field | Type | Description |
|---|---|---|
targetSchemaType | select | preset, example, or manual |
preset | select | hubspot-contact, salesforce-lead, airtable-row, sql-insert, google-sheets-row, custom-json |
targetExample | JSON | One example record in the desired output shape |
targetSchema | JSON | Manual field definitions with types, formats, constraints |
3. AI Field Mapping
| Field | Type | Default | Description |
|---|---|---|---|
openaiApiKey | secret | -- | OpenAI API key for automatic mapping. Not needed if you map all fields manually. |
llmModel | select | gpt-4o-mini | AI model to use |
4. Manual Field Mapping
| Field | Type | Description |
|---|---|---|
fieldMappings | JSON object | {"source_field": "target_field"} pairs. Overrides AI suggestions. |
5. Data Cleaning
| Field | Type | Default | Description |
|---|---|---|---|
normalizeEmails | boolean | true | Lowercase + trim email fields |
formatPhones | boolean | true | Standardize phone number fields |
normalizeDates | boolean | true | Standardize date fields |
trimAllWhitespace | boolean | true | Trim all string fields |
phoneFormat | select | E164 | Phone output format |
defaultCountryCode | string | US | Default country for phone parsing |
dateFormat | select | ISO8601 | Date output format |
6. Deduplication
| Field | Type | Description |
|---|---|---|
deduplicationKeys | string list | Target field names to deduplicate on (e.g., email). Leave empty to keep all records. |
7. Validation
| Field | Type | Default | Description |
|---|---|---|---|
strictMode | boolean | false | Drop invalid records instead of flagging them |
validationRules | JSON | -- | Custom constraints: {"field": {"minLength": 5, "pattern": "..."}} |
8. Advanced
| Field | Type | Description |
|---|---|---|
transformationRules | JSON array | [{sourceField, targetField, transform, params}] for advanced transforms like split_name. |
maxRows | integer | Limit rows for testing (0 = all) |
batchSize | integer | Records per batch (default: 100) |
Local development
# Install dependenciespip install -r requirements.txt# Create test inputmkdir -p storage/key_value_stores/default# Write your INPUT.json to storage/key_value_stores/default/INPUT.json# Run locallypython -m my_actor# Deploy to Apifyapify push
Integration
Data Bridge works with Apify's integration ecosystem:
- Actor chaining -- Use the output Dataset ID as input to another Actor
- Make / Zapier / n8n -- Trigger Data Bridge after a scraper finishes, feed the transformed data into your CRM
- MCP server -- Use via Apify's MCP integration: "transform this dataset to match my HubSpot schema"
- API -- Call via the Apify API to transform data programmatically


