Pricing

Pay per event

Synthetic Dataset Generator

Generate realistic synthetic datasets with correlated fields, built-in presets (user profiles, companies, e-commerce products, log events), custom schemas, deterministic seeding, and multiple output formats (JSON, CSV, NDJSON).

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

Actor stats

Bookmarked

Total users

Monthly active users

23 days ago

Last modified

Synthetic Data Generator — Fake, Test & Mock Data Generator

Generate realistic synthetic, fake, test, and mock data with correlated fields, built-in presets, custom schemas, deterministic seeding, and multiple output formats (JSON, CSV, NDJSON).

What does the Synthetic Data Generator do?

This actor is a synthetic data generator that produces fake data, test data, and mock data on demand -- no web scraping required. It produces structured datasets using Faker.js for realistic data generation and Copycat for deterministic, reproducible output. Use it to create test data, seed databases, populate staging environments, or benchmark data pipelines.

Key features

Built-in presets -- One-click generation for user profiles, companies, e-commerce products, and server log events
Custom schemas -- Define your own field names and types to generate exactly the data shape you need
Cross-field correlations -- Age tracks with salary, company size correlates with revenue, out-of-stock items have zero quantity
Deterministic mode -- Set a random seed for identical output across runs (same seed + same config = same data every time)
Multiple output formats -- JSON (Apify dataset), CSV, or NDJSON
16 locales -- Generate names, addresses, and phone numbers in English, German, French, Japanese, Chinese, and more
Fast and cheap -- No proxy, no browser, no external API calls. Generates 10,000+ records per second on 256MB memory

Built-in presets

Preset	Fields	Example use case
User Profiles	id, name, email, phone, DOB, age, gender, address, job title, company, salary	Test user databases, CRM seed data
Companies	company ID, name, industry, employees, revenue, founded year, website, CEO	Business directory mock data
E-commerce Products	product ID, name, description, category, price, SKU, stock, rating, reviews	Product catalog testing
Log Events	timestamp, level, service, host, request ID, method, path, status code, response time	Log pipeline testing, SIEM demos

Custom schema

Define any combination of fields with the customSchema input (JSON array):

[
  { "name": "user_id", "type": "uuid" },
  { "name": "username", "type": "name" },
  { "name": "signup_date", "type": "datetime" },
  { "name": "plan", "type": "enum", "options": { "values": ["free", "pro", "enterprise"], "weights": [0.7, 0.2, 0.1] } },
  { "name": "monthly_spend", "type": "number", "options": { "min": 0, "max": 500 } }
]

Supported field types

string, integer, number, boolean, date, datetime, email, phone, address, name, first_name, last_name, company, url, uuid, city, state, zip, country, job_title, salary, sentence, paragraph, enum

For enum type, pass options.values (array of choices) and optionally options.weights (probability weights).

Input

Parameter	Type	Default	Description
`preset`	string	`user_profiles`	Built-in preset or `custom` for custom schema
`recordCount`	integer	`100`	Number of records to generate (1 to 500,000)
`customSchema`	string		JSON array of field definitions (only used when preset is `custom`)
`locale`	string	`en`	Language/region for generated data
`seed`	integer	`0`	Random seed for deterministic output (0 = random)
`outputFormat`	string	`json`	Output format: `json`, `csv`, or `ndjson`
`enableCorrelations`	boolean	`true`	Apply cross-field correlations for realistic data

Output

JSON format (default)

Records are pushed to the Apify dataset. Each record is a flat JSON object matching the selected preset or custom schema.

CSV / NDJSON format

The file is saved to the key-value store under the key OUTPUT. A summary record is also pushed to the dataset with download instructions.

Example output (user_profiles preset)

{
  "id": "a1abcd86-4c43-4300-864f-066b9f5e43eb",
  "first_name": "Lydia",
  "last_name": "MacGyver",
  "email": "Taryn8@gmail.com",
  "phone": "(444) 909-7300",
  "date_of_birth": "2005-01-20",
  "age": 21,
  "gender": "Female",
  "address": "8239 Johnston Shore",
  "city": "West Chanelleburgh",
  "state": "Utah",
  "zip": "00084",
  "country": "Holy See (Vatican City State)",
  "job_title": "Principal Applications Developer",
  "company": "Nienow-Gibson, Bruen and Mayer",
  "salary": 109866,
  "created_at": "2025-09-23T21:04:26.698Z"
}

Cost

This actor uses Pay Per Event pricing:

$0.10 per actor start
$0.0001 per data record generated

Example: Generating 10,000 user profiles costs $0.10 (start) + $1.00 (records) = $1.10 total.

Performance

Records	Approximate time	Memory
100	< 1 second	256 MB
1,000	~1 second	256 MB
10,000	~5 seconds	256 MB
100,000	~30 seconds	256 MB

Use cases

Database seeding -- Populate development and staging databases with realistic test data
API testing -- Generate request/response payloads for load testing and integration tests
Data pipeline validation -- Feed synthetic data through ETL pipelines to verify transformations
UI prototyping -- Fill dashboards and reports with realistic-looking data
Machine learning -- Generate training data for models that need structured tabular input
Demo environments -- Create convincing demo data without using real customer information

FAQ

How do I generate fake data for testing?

Pick a preset (user profiles, companies, e-commerce products, or log events) or define a customSchema, set recordCount, and run. The actor returns fake test data in JSON, CSV, or NDJSON with no web scraping involved.

Can this generate mock data as a CSV or API output?

Yes. Set outputFormat to csv or ndjson to get a downloadable file from the key-value store, or keep json to read records straight from the Apify dataset via the API.

How do I produce reproducible synthetic data every run?

Set a non-zero seed. The same seed plus the same config yields identical output on every run, so your test data and mock datasets stay stable across CI runs.

Need more features?

If you need additional field types, presets, or output formats, file an issue or get in touch. We actively maintain this actor and welcome feature requests.

Synthetic Data Generator

web.harvester/synthetic-data-generator

Generate realistic fake data for testing and development. Create profiles, addresses, companies, and transactions using Faker. 50+ locales, deterministic mode, custom schemas.

Web Harvester

Synthetic E-Commerce Data Generator

jungle_synthesizer/synthetic-ecommerce-data-generator

Generate realistic e-commerce test data with interconnected products, customers, orders, and reviews. Features referential integrity, realistic distributions, temporal coherence, industry presets, and deterministic seed mode.

BowTiedRaccoon

Ai Synthetic Data Generator

ruv/ai-synthetic-data-generator

Generate unlimited, high-quality synthetic data for training AI models, testing systems, and building robust agentic applications

Reuven Cohen

Synthetic Financial Data Generator

jungle_synthesizer/synthetic-financial-data-generator

Generate realistic synthetic financial transaction data with category-aware amounts, temporal spending patterns, running balances, and configurable fraud labels for ML training and fintech testing

BowTiedRaccoon

100

SyntheticFlow API - LLM-Powered Contextual Data Generator

fresh_cliff/syntheticflow-api---llm-powered-contextual-data-generator

Generate AI-powered synthetic data with LLM intelligence for business contexts. Create realistic customer profiles, documents, market data for AI agents. Privacy-compliant, multimodal, trend-aware synthetic data generation.

Brennan Crawford

Mock Data Generator — Realistic test data

perryay/mock-data-generator

Instantly populate your dev environment with realistic test data. Generate users, companies, products, addresses, emails, phone numbers — or define custom schemas with your own fields. Output JSON or CSV with up to 1,000 records at once. Perfect for prototyping, DB seeding, and API testing.

Perry AY

LatAm Fintech Synthetic Data Generator

active_yardstick/latam-synth

Generate privacy-safe synthetic users, savings goals & transactions calibrated on 506K real records from a production LatAm savings app (2015–2024). Multimodal amounts, real seasonality, reproducible by seed

Joel Mendoza

5.0

Output & Dataset Schema Creator

zuzka/output-dataset-schema-creator

Generate JSON schemas for output and dataset on your Actor using AI. Perfect for testing new actors.

Zuzka Pelechová

Synthea - Create Synthetic FHIR Compliant Health Records

johnvc/Synthea-Medical-Record-Generator-API

Create realistic synthetic patient healthcare data without privacy concerns. Generates FHIR R4 bundles, CSV files, and comprehensive patient records with demographics, conditions, medications, and procedures. Ideal for EHR testing, healthcare development, medical research, and machine learning.

John

5.0

E-commerce Product Matching Tool

tri_angle/e-commerce-product-matching-tool

Match products across e-commerce datasets with E-Commerce Product Matching Tool. Use it with E-commerce Scraping Tool datasets to automatically find identical and similar products and power price monitoring or catalog comparison.