Pricing

Pay per usage

Data Bridge

Turn messy data into clean records for HubSpot, Salesforce, Airtable, SQL, Google Sheets, or any custom schema. Just point it at your data and pick a target format. AI figures out which fields go where, normalizes emails or phone numbers, parses dates, removes duplicates, and validates the output.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Filip Cicvárek

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

Data Bridge - Universal Data Transformer

Transform any dataset into any target schema using AI-powered field mapping. Feed it data from Apify scrapers, CSV files, or JSON APIs, and get it out in the exact format your CRM, database, or analytics tool expects.

What it does

Data Bridge takes source data and a target schema, then:

Analyzes the source data structure automatically
Maps fields to the target schema (via AI or manual mapping)
Cleans every record: email normalization, phone formatting, date standardization, whitespace trimming
Deduplicates on any combination of fields
Validates against target schema constraints
Outputs a clean dataset + detailed reports

The AI is called once to figure out the mapping, then all records are transformed deterministically. This keeps cost under $0.01 per run regardless of dataset size.

Two ways to map fields

Option A: Let the AI figure it out (recommended)

Provide an OpenAI API key and the AI will automatically match your source fields to the target schema. It matches by meaning, not just by name -- so company_name maps to Company, email_address maps to email, and a full_name field gets split into firstname and lastname.

Cost: ~$0.001 per run with GPT-4o Mini.

Option B: Map fields manually

If you don't want to use AI, provide a simple field mapping that tells the Actor which source fields go where:

{
    "email_address": "email",
    "company_name": "company",
    "phone_number": "phone",
    "city": "city"
}

Left side = your source field name. Right side = target field name. That's it.

You can also combine both: use AI for most fields and add manual mappings to override specific ones.

Quick start examples

Example 1: AI mapping to HubSpot (easiest)

{
    "sourceType": "raw",
    "rawData": "[{\"full_name\": \"John Doe\", \"email_address\": \"JOHN.DOE@EXAMPLE.COM\", \"phone_number\": \"5551234567\", \"company_name\": \"Acme Corp\"}]",
    "targetSchemaType": "preset",
    "preset": "hubspot-contact",
    "deduplicationKeys": ["email"],
    "openaiApiKey": "sk-..."
}

Output (HubSpot-ready):

{
    "firstname": "John",
    "lastname": "Doe",
    "email": "john.doe@example.com",
    "phone": "+15551234567",
    "company": "Acme Corp"
}

The AI figured out that full_name should be split into firstname/lastname, email_address maps to email (lowercased), and phone_number maps to phone (formatted to E.164).

Example 2: Manual field mapping (no AI key needed)

{
    "sourceType": "raw",
    "rawData": "[{\"full_name\": \"John Doe\", \"email_address\": \"JOHN.DOE@EXAMPLE.COM\", \"phone_number\": \"5551234567\", \"company_name\": \"Acme Corp\"}]",
    "targetSchemaType": "preset",
    "preset": "hubspot-contact",
    "fieldMappings": {
        "email_address": "email",
        "phone_number": "phone",
        "company_name": "company",
        "full_name": "firstname"
    },
    "deduplicationKeys": ["email"]
}

Data cleaning (email lowercasing, phone formatting, whitespace trimming) happens automatically based on the target field type -- no extra configuration needed.

Example 3: Merge multiple scraper datasets into Salesforce leads

{
    "sourceType": "dataset",
    "datasetIds": ["dataset-id-from-scraper-1", "dataset-id-from-scraper-2"],
    "targetSchemaType": "preset",
    "preset": "salesforce-lead",
    "deduplicationKeys": ["Email"],
    "openaiApiKey": "sk-..."
}

Records from both datasets are merged, mapped to Salesforce Lead fields, deduplicated by email, and validated.

Example 4: Match an example record (no preset needed)

If you don't use a standard platform, just paste one example of what you want the output to look like:

{
    "sourceType": "dataset",
    "datasetIds": ["your-dataset-id"],
    "targetSchemaType": "example",
    "targetExample": {
        "contact_name": "Jane Doe",
        "contact_email": "jane@example.com",
        "phone": "+1-555-123-4567",
        "signup_date": "2026-01-15",
        "is_active": true
    },
    "openaiApiKey": "sk-..."
}

The Actor infers the target schema from your example and maps your source data to match it.

Example 5: Advanced transforms (split names, join fields)

For cases where you need specific transforms like splitting a full name:

{
    "sourceType": "dataset",
    "datasetIds": ["your-dataset-id"],
    "targetSchemaType": "preset",
    "preset": "hubspot-contact",
    "fieldMappings": {
        "email_address": "email",
        "company_name": "company",
        "city": "city"
    },
    "transformationRules": [
        {"sourceField": "full_name", "targetField": "firstname", "transform": "split_name", "params": {"output": "first"}},
        {"sourceField": "full_name", "targetField": "lastname", "transform": "split_name", "params": {"output": "last"}}
    ]
}

fieldMappings handles the simple field-to-field mapping. transformationRules handles the cases that need special transforms. Both work together.

Supported input sources

Source	How to use
Apify Datasets	Provide one or more Dataset IDs -- records from all datasets are merged before transformation
Remote URL	Point to a JSON, JSONL, or CSV file
Paste JSON	Paste a JSON array directly into the input

Target schema presets

Preset	Output field examples
HubSpot Contact	`email`, `firstname`, `lastname`, `phone`, `company`, `city`, `state`, `zip`, `lifecyclestage`
Salesforce Lead	`Email`, `FirstName`, `LastName`, `Company`, `Phone`, `City`, `State`, `LeadSource`
Airtable Row	`Name`, `Email`, `Phone`, `URL`, `Tags`, `Date`, `Category`, `Checkbox`
SQL INSERT	`id`, `name`, `email`, `phone`, `category`, `is_active`, `created_at`, `metadata`
Google Sheets Row	`Name`, `Email`, `Phone`, `Address`, `Date`, `Tags`, `Notes`
Custom JSON	Keeps original field names -- just cleans and deduplicates the data

You can also define your own schema manually or paste an example record.

Automatic data cleaning

These run automatically on every mapped field based on the target field type. All are enabled by default and can be toggled off individually.

What	Before	After
Lowercase emails	`JOHN@EXAMPLE.COM`	`john@example.com`
Format phone numbers	`(555) 123-4567`	`+15551234567` (E.164)
Standardize dates	`Jan 15, 2026`	`2026-01-15T00:00:00` (ISO 8601)
Trim whitespace	`John Doe`	`John Doe`

Phone and date output formats are configurable (E.164 / national / international, ISO 8601 / YYYY-MM-DD / Unix timestamp, etc.).

Advanced transforms

For cases that need more than simple field mapping and auto-cleaning, use transformationRules:

Transform	What it does	Example
`split_name`	Split `"John Doe"` into first/last	`{"output": "first"}` returns `"John"`
`join_fields`	Concatenate multiple values	`{"separator": ", "}`
`type_cast`	Convert between types	`{"target_type": "integer"}`
`boolean_parse`	Parse `"yes"/"no"`, `1/0` to boolean
`date_normalize`	Parse + reformat dates	`{"target_format": "%Y-%m-%d"}`
`phone_format`	Format phone numbers	`{"format": "NATIONAL"}`
`email_lowercase`	Lowercase + trim emails
`trim_whitespace`	Strip + collapse whitespace

Output

Every run produces:

Output	Where to find it	What's in it
Transformed dataset	Default Dataset	All records in the target schema
Validation report	Key-Value Store: `VALIDATION_REPORT`	Per-row errors, summary stats, rejected records
Transformation log	Key-Value Store: `TRANSFORMATION_LOG`	Field mapping details, dedup stats, processing time
Field mappings	Key-Value Store: `FIELD_MAPPINGS`	The mapping plan (save this to reuse without AI next time)

Each output record includes metadata fields:

_bridgeRowIndex -- sequential row number from source
_bridgeStatus -- "ok", "warning", or "error"
_bridgeWarnings -- array of warning messages (only if there are warnings)

Input reference

1. Source Data

Field	Type	Description
`sourceType`	select	`dataset`, `url`, or `raw`
`datasetIds`	string list	One or more Apify Dataset IDs
`sourceUrl`	string	URL to a JSON, JSONL, or CSV file
`rawData`	string	JSON array pasted as text

2. Target Format

Field	Type	Description
`targetSchemaType`	select	`preset`, `example`, or `manual`
`preset`	select	`hubspot-contact`, `salesforce-lead`, `airtable-row`, `sql-insert`, `google-sheets-row`, `custom-json`
`targetExample`	JSON	One example record in the desired output shape
`targetSchema`	JSON	Manual field definitions with types, formats, constraints

3. AI Field Mapping

Field	Type	Default	Description
`openaiApiKey`	secret	--	OpenAI API key for automatic mapping. Not needed if you map all fields manually.
`llmModel`	select	`gpt-4o-mini`	AI model to use

4. Manual Field Mapping

Field	Type	Description
`fieldMappings`	JSON object	`{"source_field": "target_field"}` pairs. Overrides AI suggestions.

5. Data Cleaning

Field	Type	Default	Description
`normalizeEmails`	boolean	`true`	Lowercase + trim email fields
`formatPhones`	boolean	`true`	Standardize phone number fields
`normalizeDates`	boolean	`true`	Standardize date fields
`trimAllWhitespace`	boolean	`true`	Trim all string fields
`phoneFormat`	select	`E164`	Phone output format
`defaultCountryCode`	string	`US`	Default country for phone parsing
`dateFormat`	select	`ISO8601`	Date output format

6. Deduplication

Field	Type	Description
`deduplicationKeys`	string list	Target field names to deduplicate on (e.g., `email`). Leave empty to keep all records.

7. Validation

Field	Type	Default	Description
`strictMode`	boolean	`false`	Drop invalid records instead of flagging them
`validationRules`	JSON	--	Custom constraints: `{"field": {"minLength": 5, "pattern": "..."}}`

8. Advanced

Field	Type	Description
`transformationRules`	JSON array	`[{sourceField, targetField, transform, params}]` for advanced transforms like `split_name`.
`maxRows`	integer	Limit rows for testing (0 = all)
`batchSize`	integer	Records per batch (default: 100)

Local development

# Install dependencies
pip install -r requirements.txt

# Create test input
mkdir -p storage/key_value_stores/default
# Write your INPUT.json to storage/key_value_stores/default/INPUT.json

# Run locally
python -m my_actor

# Deploy to Apify
apify push

Integration

Data Bridge works with Apify's integration ecosystem:

Actor chaining -- Use the output Dataset ID as input to another Actor
Make / Zapier / n8n -- Trigger Data Bridge after a scraper finishes, feed the transformed data into your CRM
MCP server -- Use via Apify's MCP integration: "transform this dataset to match my HubSpot schema"
API -- Call via the Apify API to transform data programmatically

Job Descriptions Extractor

dadhalfdev/job-descriptions-extractor

Want to turn messy job descriptions into clean, structured data? This extractor uses AI to pull out all the key information from any job posting. Just paste a job description text into the input, and the extractor will parse out up to 18 data points.

Marco Rodrigues

Airtable Scraper

spencers/airtable-scraper

Scrape data from any embedded Airtable page

Spencer Smith

103

Airtable API - Database & Records Automation

alizarin_refrigerator-owner/airtable-api---database-records-automation

Automate your Airtable bases with the official API. List, create, update, and delete records. Manage tables and fields. Sync data between Airtable and other systems. Perfect for CRM automation, inventory management, project tracking, and data pipelines.

The Howlers

CRM Lead Data Cleaner (Email/Phone Validator + Dedup)

motivational_nickel/universal-data-cleaner

Turn messy CSV or Excel leads into clean, validated, CRM-ready data. Fix Excel E+11 phone numbers, validate emails, remove duplicates, and score lead quality (HIGH, MEDIUM, LOW). Built for sales teams, lead gen agencies, and automation workflows.

Leoncio Jr Coronado

B2B Lead Enrichment — Google Maps to CRM

samstorm/lead-enrichment-actor

Scrape any business type from Google Maps with verified email enrichment, phone numbers & social links. Works for any niche or location. Export to HubSpot, Salesforce, or CSV. The all-in-one B2B lead generation tool.

Sam Kleespies

Insurance Agent Lead Scraper - Verified Emails

samstorm/insurance-agent-lead-scraper

Scrape insurance agent and agency leads from Google Maps with verified emails, phones and social links. Export clean data to HubSpot, Salesforce, CSV, JSON, or Excel.

Sam Kleespies

AI Web Scraper — Structured Data Extraction from Any Website

oneary/ai-powered-data-extractor

Extract structured data from any webpage using AI. Define your schema and the AI identifies relevant content — no selectors or coding needed. Handles products, reviews, contacts, and custom fields.

Luan M.

Lead Finder With Emails | $1.4 / 1k

fatihtahta/lead-finder-with-emails-scraper

Get a clean, deduped list of B2B contacts with verified emails and rich company context, sourced from multiple places. Filter by title, seniority, location, industry, headcount and more. It validates, enrich, removes duplicates and provides the list. Alternative to Apollo, ZoomInfo and Lusha.

Fatih Tahta

645

3.0

AI Web Reader (RAG Ready)

viinaysonii/ai-web-reader-rag-ready

Convert any webpage into clean, structured, AI-ready Markdown. Removes ads, images, and UI noise, normalizes content, and outputs data optimized for LLMs, RAG pipelines, and AI agents. Fast, scalable, and built for real-world AI workflows.