Pricing

from $0.01 / 1,000 results

Try for free

Go to Apify Store

Synthea - Create Synthetic FHIR Compliant Health Records

Try for free

Create realistic synthetic patient healthcare data without privacy concerns. Generates FHIR R4 bundles, CSV files, and comprehensive patient records with demographics, conditions, medications, and procedures. Ideal for EHR testing, healthcare development, medical research, and machine learning.

Pricing

from $0.01 / 1,000 results

Rating

5.0

(3)

Developer

John

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

🏥 Synthea Synthetic Patient Data Generator

Generate realistic synthetic patient healthcare data using Synthea. Create FHIR bundles, CSV files, and metadata for synthetic patients with demographics, conditions, medications, procedures, and more. Perfect for healthcare research, EHR testing, medical training, and healthcare application development.

💡 What is Synthea Synthetic Patient Data Generator?

Synthea Synthetic Patient Data Generator is a powerful Apify Actor that generates realistic, synthetic patient healthcare data using the open-source Synthea™ synthetic patient population simulator. This Actor transforms complex healthcare data generation into a simple, scalable API that produces comprehensive patient records in industry-standard formats.

Whether you're building healthcare applications, testing EHR systems, conducting medical research, training machine learning models, or developing healthcare analytics tools, you'll get realistic, privacy-compliant patient data without the legal and ethical concerns of using real patient information ⚡.

✅ FHIR-Compliant: Generates data in FHIR (Fast Healthcare Interoperability Resources) format, the industry standard for healthcare data exchange

✅ Comprehensive Data: Includes patient demographics, conditions, medications, procedures, observations, encounters, and more

✅ Privacy-Safe: Synthetic data eliminates HIPAA concerns and privacy risks associated with real patient data

✅ Reproducible: Use seeds to generate the same patient populations for consistent testing and development

📦 What Data Can You Extract?

🏷️ Data Type	📋 Description
👤 Patient Demographics	Age, gender, race, ethnicity, address, and other demographic information
🏥 FHIR Bundles	Complete FHIR R4 bundles containing all patient healthcare data in standardized format
💊 Medications	Prescribed medications with dosages, frequencies, and administration details
🩺 Conditions	Diagnosed medical conditions with onset dates, severity, and clinical status
🔬 Procedures	Medical procedures performed with dates, codes, and outcomes
📊 Observations	Lab results, vital signs, and other clinical observations
🏥 Encounters	Healthcare encounters including hospitalizations, office visits, and emergency visits
📋 CSV Files	Tabular data exports for easy analysis in spreadsheet applications
📄 Metadata	Generation metadata including patient counts, execution parameters, and file listings
🔗 US Core IG Support	Optional US Core R4 Implementation Guide profiles for enhanced interoperability

This structured synthetic patient dataset can be exported for EHR testing, healthcare application development, medical research, machine learning training, and healthcare analytics workflows.

⚙️ Key Features

✨ Realistic Synthetic Data — Generates medically accurate patient records with realistic relationships between conditions, medications, and procedures

🔬 FHIR R4 Compliant — Produces industry-standard FHIR bundles ready for integration with FHIR-compliant systems

🎯 Advanced Filtering — Filter patients by gender, age range, location (state/city), and other demographic characteristics

🌍 Location-Based Generation — Generate patients from specific US states and cities for geographic analysis

📊 Multiple Output Formats — Get data in FHIR JSON bundles, CSV files, and structured metadata

🔄 Reproducible Results — Use seeds to generate identical patient populations for consistent testing

💰 Pay-Per-Patient Pricing — Transparent pricing: setup fee + per-patient charge (only pay for patients you generate)

🛡️ Production-Ready — Built-in error handling, validation, and graceful failure modes

📦 Structured Output — Clean JSON output with complete metadata and file organization

⚙️ Customizable Configuration — Support for custom Synthea configuration files and modules

🇺🇸 US Core IG Support — Optional US Core R4 Implementation Guide profiles for enhanced FHIR compliance

💾 Key-Value Store Storage — All FHIR bundles and CSV data are stored in Key-Value Store with references in the dataset, ensuring compliance with size limits while preserving all data

📖 Usage Examples

Example 1: Basic Patient Generation

Generate a single synthetic patient with default settings.

{
  "population_size": 1
}

Output: One patient record with complete FHIR bundle, CSV files, and metadata.

Example 2: Generate Multiple Patients with Location

Generate 10 patients from Massachusetts.

{
  "population_size": 10,
  "state": "Massachusetts"
}

Output: 10 patient records from Massachusetts with location-specific healthcare data.

Example 3: Filter by Gender and Age Range

Generate 5 female patients aged 30-40 years.

{
  "population_size": 5,
  "gender": "F",
  "age_range": "30-40"
}

Output: 5 female patients between 30-40 years old with age and gender-appropriate conditions and medications.

Example 4: Reproducible Generation with Seed

Generate the same 3 patients every time using a seed for reproducibility.

{
  "population_size": 3,
  "seed": 12345
}

Output: Identical 3 patient records every time this seed is used, perfect for testing and development.

Example 5: Location-Specific Generation (State and City)

Generate patients from a specific city in California.

{
  "population_size": 5,
  "state": "California",
  "city": "San Francisco"
}

Output: 5 patient records from San Francisco, California with location-specific healthcare encounters.

Example 6: Custom Reference Date

Generate patients with a specific reference date for historical scenarios.

{
  "population_size": 10,
  "reference_date": "20200101",
  "state": "New York"
}

Output: 10 patients from New York with healthcare data generated relative to January 1, 2020.

Example 7: US Core Implementation Guide

Generate FHIR bundles using US Core R4 Implementation Guide profiles.

{
  "population_size": 5,
  "exporter_fhir_use_us_core_ig": true,
  "state": "Texas"
}

Output: 5 patients from Texas with FHIR bundles conforming to US Core R4 Implementation Guide profiles.

Example 8: Comprehensive Configuration

Use multiple filters and options together for precise patient generation.

{
  "population_size": 20,
  "seed": 67890,
  "gender": "M",
  "age_range": "25-50",
  "state": "Florida",
  "city": "Miami",
  "exporter_fhir_use_us_core_ig": true,
  "output_file": "miami_patients.json"
}

Output: 20 male patients aged 25-50 from Miami, Florida with US Core IG-compliant FHIR bundles, saved to a local file.

Example 9: Large Population Generation (Recommended Approach)

For large populations (>100 patients), generate in smaller batches to avoid disk space limitations.

Batch 1:

{
  "population_size": 250,
  "seed": 1000,
  "state": "California"
}

Batch 2:

{
  "population_size": 250,
  "seed": 2000,
  "state": "California"
}

Output: 100 total patients from California, generated in two batches. Each batch is automatically cleaned up after storage, preventing disk space issues.

💡 Tip: Use different seeds for each batch to ensure unique patients, or use sequential seeds to maintain reproducibility.

🔍 Input Parameters

Parameter	Type	Required	Default	Description
`population_size`	`integer`	✅ Yes	`1`	Number of synthetic patients to generate. Must be at least 1. Each patient record is charged separately. ⚠️ For large populations (>100 patients): Consider generating in smaller batches (200-300 patients per run) to avoid disk space limitations. Files are automatically cleaned up after storage in Key-Value Store.
`seed`	`integer`	❌ No	-	Random seed for reproducible patient generation. If provided, the same seed will generate the same patients. Useful for testing and consistent data generation.
`reference_date`	`string`	❌ No	Today's date	Reference date for patient generation in `YYYYMMDD` format (e.g., `"20240101"`). If not provided, uses today's date. All patient timelines are generated relative to this date.
`clinician_seed`	`integer`	❌ No	-	Seed for clinician assignment to patients. If provided, ensures consistent clinician assignments across runs. Useful for maintaining consistent provider-patient relationships.
`gender`	`string`	❌ No	-	Filter patients by gender. Only patients of the specified gender will be generated. See Gender Options below.
`age_range`	`string`	❌ No	-	Filter patients by age range in format `"minAge-maxAge"` (e.g., `"30-40"`). Only patients within this age range will be generated. Ages must be integers.
`state`	`string`	❌ No	-	US state name for patient location (e.g., `"Massachusetts"`, `"California"`). If specified, only patients from this state will be generated. Use full state names, not abbreviations.
`city`	`string`	❌ No	-	City name for patient location. Requires `state` to be specified. If provided, only patients from this city will be generated.
`config_path`	`string`	❌ No	-	Path to a local Synthea configuration file. If not provided, uses default Synthea configuration. Advanced users can customize Synthea behavior with custom config files.
`modules_dir`	`string`	❌ No	-	Path to a local Synthea modules directory. If not provided, uses default Synthea modules. Advanced users can use custom modules for specialized patient generation scenarios.
`exporter_fhir_use_us_core_ig`	`boolean`	❌ No	`false`	If enabled (`true`), generates FHIR bundles using US Core R4 Implementation Guide profiles. Enhances FHIR compliance for US healthcare systems. See US Core IG Options below.
`output_dir`	`string`	❌ No	Default location	Base directory where Synthea will create its output. Synthea creates an `output/` subdirectory here. If not provided, uses default location.
`output_file`	`string`	❌ No	-	Optional filename to save results as JSON file locally. If not provided, results are only pushed to Apify dataset. Useful for local development and testing.

⚠️ Limitations & Important Notes

Disk Space Considerations

Large Population Generation: When generating large populations (>100 patients), disk space may become a limiting factor. Each patient generates FHIR JSON files that can be 0.5-1 MB in size. For 1000 patients, this can require 100+ MB of temporary disk space during generation.

Solutions:

Generate in batches: For large populations, split into multiple runs of 200-300 patients each
Automatic cleanup: All output files are automatically deleted from disk after being stored in Apify's Key-Value Store, preventing accumulation
Monitor space: If you encounter "No space left on device" errors, reduce the population_size and run multiple smaller batches

Note: Files are temporarily stored on disk during generation, then automatically cleaned up after successful storage in Key-Value Store. This ensures all data is preserved while managing disk space efficiently.

📊 Enumerated Value Options

Gender Options

The gender parameter accepts the following values:

Value	Description	When to Use
`"M"`	Male	Generate only male patients. Useful for gender-specific research, testing, or analysis.
`"F"`	Female	Generate only female patients. Useful for gender-specific research, testing, or analysis.

Note: If gender is not specified, patients of all genders will be generated according to natural distribution.

US Core Implementation Guide Options

The exporter_fhir_use_us_core_ig parameter accepts the following values:

Value	Description	When to Use
`true`	Enable US Core R4 IG	Use when you need FHIR bundles that conform to US Core R4 Implementation Guide profiles. Required for integration with many US healthcare systems and EHR platforms.
`false`	Standard FHIR R4	Use standard FHIR R4 format (default). Suitable for most use cases and international healthcare systems.

Note: US Core IG profiles provide additional constraints and extensions specific to US healthcare data standards, enhancing interoperability with US-based systems.

📤 Output Format

Dataset Structure

Each generation run returns a structured JSON object with the following structure:

{
  "generation_timestamp": "2025-01-15T14:30:00.123456",
  "patient_count": 5,
  "execution_metadata": {
    "population_size": 5,
    "seed": 12345,
    "reference_date": "20240101",
    "clinician_seed": null,
    "gender": "F",
    "age_range": "30-40",
    "state": "Massachusetts",
    "city": null,
    "base_directory": "/path/to/output",
    "output_directory": "/path/to/output/output",
    "execution_timestamp": "2025-01-15T14:25:00.000000",
    "command": "java -jar synthea-with-dependencies.jar -s 12345 -p 5 -g F -a 30-40 Massachusetts"
  },
  "generated_files": {
    "fhir": [
      "/path/to/output/output/fhir/Patient_12345.json",
      "/path/to/output/output/fhir/Patient_67890.json"
    ],
    "csv": [
      "/path/to/output/output/csv/patients.csv",
      "/path/to/output/output/csv/conditions.csv",
      "/path/to/output/output/csv/medications.csv"
    ],
    "metadata": [
      "/path/to/output/output/metadata/metadata.json"
    ]
  },
  "data": {
    "fhir_bundles": [
      {
        "file": "Patient_12345.json",
        "path": "/path/to/output/output/fhir/Patient_12345.json",
        "key_value_store_key": "fhir-bundle-Patient_12345.json",
        "size_mb": 5.2,
        "stored_in_kv_store": true,
        "format": "json"
      },
      {
        "file": "Patient_67890.json",
        "path": "/path/to/output/output/fhir/Patient_67890.json",
        "key_value_store_key": "fhir-bundle-Patient_67890.json",
        "size_mb": 12.5,
        "stored_in_kv_store": true,
        "format": "json"
      }
    ],
    "csv_data": {
      "patients.csv": {
        "file": "patients.csv",
        "path": "/path/to/output/output/csv/patients.csv",
        "key_value_store_key": "csv-patients",
        "size_mb": 2.1,
        "stored_in_kv_store": true,
        "row_count": 100
      }
    },
    "metadata": {
      "metadata.json": {
        "file": "metadata.json",
        "path": "/path/to/output/output/metadata/metadata.json",
        "data": {
          "patientCount": 5
        }
      }
    }
  },
  "summary": {
    "total_fhir_files": 5,
    "total_csv_files": 10,
    "total_metadata_files": 1,
    "patient_count": 5
  }
}

Output Fields

Top-Level Fields

generation_timestamp: ISO 8601 timestamp when the patient generation was completed
patient_count: Number of patient records generated (used for billing)
execution_metadata: Complete metadata about the Synthea execution including all parameters used
generated_files: Lists of generated files organized by type (FHIR, CSV, metadata)
data: Parsed data from all generated files, ready for use
summary: Summary statistics of the generation run

FHIR Bundle Structure

All FHIR bundles are stored in Key-Value Store with references in the dataset:

{
  "file": "Patient_12345.json",
  "path": "/path/to/output/output/fhir/Patient_12345.json",
  "key_value_store_key": "fhir-bundle-Patient_12345.json",
  "size_mb": 5.2,
  "stored_in_kv_store": true,
  "format": "json"
}

Each FHIR bundle (whether in dataset or Key-Value Store) contains:

resourceType: Always "Bundle" for FHIR bundles
type: Bundle type (typically "collection")
entry: Array of FHIR resources including:
- Patient: Demographics and basic information
- Condition: Diagnosed medical conditions
- MedicationRequest: Prescribed medications
- Procedure: Medical procedures performed
- Observation: Lab results and vital signs
- Encounter: Healthcare encounters (visits, hospitalizations)
- Organization: Healthcare organizations (hospitals, clinics)
- Practitioner: Healthcare providers
- And more healthcare resources

Retrieving FHIR Bundles from Key-Value Store

All FHIR bundles are stored in Key-Value Store. Retrieve the full data using the key_value_store_key:

Using Apify SDK (Python):

from apify import Actor

async with Actor:
    # Get dataset item
    dataset_item = await Actor.get_dataset_items()
    
    # For each FHIR bundle (all are stored in Key-Value Store)
    for bundle in dataset_item[0]['data']['fhir_bundles']:
        # Retrieve from Key-Value Store
        full_bundle_data = await Actor.get_value(bundle['key_value_store_key'])
        print(f"Retrieved bundle: {full_bundle_data}")
    
    # For each CSV file (all are stored in Key-Value Store)
    for filename, csv_ref in dataset_item[0]['data']['csv_data'].items():
        # Retrieve from Key-Value Store
        full_csv_data = await Actor.get_value(csv_ref['key_value_store_key'])
        print(f"Retrieved CSV {filename}: {full_csv_data}")

Using Apify API:

# Get the key_value_store_key from dataset item
KEY="fhir-bundle-Patient_67890.json"
STORE_ID="your-store-id"

# Retrieve from Key-Value Store
curl "https://api.apify.com/v2/key-value-stores/${STORE_ID}/records/${KEY}" \
  -H "Authorization: Bearer ${APIFY_TOKEN}"

Note: All FHIR bundles and CSV data are stored in Key-Value Store to ensure compliance with Apify's 9MB dataset item limit. The dataset contains only references and metadata, while the full data is accessible via Key-Value Store.

CSV Data Structure

All CSV files are stored in Key-Value Store with references in the dataset:

{
  "patients.csv": {
    "file": "patients.csv",
    "path": "/path/to/output/output/csv/patients.csv",
    "key_value_store_key": "csv-patients",
    "size_mb": 2.1,
    "stored_in_kv_store": true,
    "row_count": 100
  }
}

CSV files are organized by data type:

patients.csv: Basic patient demographics
conditions.csv: Patient conditions and diagnoses
medications.csv: Prescribed medications
procedures.csv: Medical procedures
observations.csv: Lab results and observations
encounters.csv: Healthcare encounters
And more specialized CSV files

Each CSV file entry includes:

file: Filename
path: Full file path
key_value_store_key: Key to retrieve full CSV data from Key-Value Store
row_count: Number of rows in the file
size_mb: Size of the CSV data in megabytes
stored_in_kv_store: Always true

To retrieve CSV data, use the same method as FHIR bundles - get the key_value_store_key and fetch from Key-Value Store.

💰 Pricing

This Actor uses a pay-per-event pricing model with transparent pricing:

Setup Fee: per Actor run (one-time charge for instance setup and initialization)
Patient Record: per patient record generated (charged for each synthetic patient record returned in the results)

You only pay for patients you actually receive, making it cost-effective for both small-scale testing and large-scale data generation. The Actor performs a pre-run credit check to ensure you have sufficient funds. If insufficient funds are detected, the Actor will exit gracefully with an error message.

Note: System files (hospitals, practitioners) are included at no additional charge beyond the setup fee.

🎯 Use Cases

EHR Testing & Development: Generate realistic patient data for testing Electronic Health Record systems without privacy concerns
Healthcare Application Development: Create test datasets for healthcare applications, mobile health apps, and telemedicine platforms
Medical Research: Use synthetic data for research when real patient data is unavailable or requires extensive IRB approval
Machine Learning Training: Train healthcare AI models on realistic synthetic data without privacy restrictions
Healthcare Analytics: Analyze patient populations, treatment patterns, and healthcare outcomes using synthetic data
FHIR System Integration: Test FHIR-compliant systems and APIs with realistic patient bundles
Medical Education: Create patient scenarios for medical training and education programs
Healthcare Data Modeling: Model healthcare workflows, care pathways, and treatment protocols
Compliance Testing: Test healthcare systems for HIPAA compliance and data privacy requirements
Population Health Analysis: Study population health trends, disease prevalence, and healthcare utilization patterns
Clinical Decision Support: Develop and test clinical decision support systems with realistic patient scenarios
Healthcare Interoperability: Test healthcare data exchange and interoperability between systems

🚀 Ready to Generate Synthetic Patient Data?

Start using Synthea Synthetic Patient Data Generator today and create realistic, privacy-compliant patient healthcare data in minutes! Whether you're building healthcare applications, testing EHR systems, conducting medical research, or training AI models, you'll have comprehensive FHIR-compliant patient data ready for immediate use.

Made with ❤️

Transform your healthcare development and research with reliable, production-ready synthetic patient data generation. Generate realistic patient records without privacy concerns or legal restrictions.

Last Updated: 2025.12.08

SyntheticFlow API - LLM-Powered Contextual Data Generator

fresh_cliff/syntheticflow-api---llm-powered-contextual-data-generator

Generate AI-powered synthetic data with LLM intelligence for business contexts. Create realistic customer profiles, documents, market data for AI agents. Privacy-compliant, multimodal, trend-aware synthetic data generation.

Brennan Crawford

Ai Synthetic Data Generator

ruv/ai-synthetic-data-generator

Generate unlimited, high-quality synthetic data for training AI models, testing systems, and building robust agentic applications

Reuven Cohen

Synthetic Data Generator

web.harvester/synthetic-data-generator

Generate realistic fake data for testing and development. Create profiles, addresses, companies, and transactions using Faker. 50+ locales, deterministic mode, custom schemas.

Web Harvester

Military Records API - Service Records & Verification

alizarin_refrigerator-owner/military-records-api---service-records-verification

Access information about military service records through the National Personnel Records Center (NPRC). Query DD-214 request procedures, service verification, awards & decorations, medical records, personnel records, discharge upgrade processes & medal replacements.

The Howlers

Healthgrades Doctor & Healthcare Provider Scraper

alizarin_refrigerator-owner/healthgrades-scraper

Scrape Healthgrades for doctor profiles, patient reviews, ratings, and practice information. Essential for healthcare reputation monitoring and provider research.

The Howlers

Vitals.com Scraper 🏥

shahidirfan/Vitals-com-Scraper

Unlock valuable healthcare data! Instantly extract detailed doctor profiles, patient reviews, and facility info from Vitals.com. Perfect for medical market research, lead generation, and competitive analysis. Get accurate, structured data efficiently today.

Shahid Irfan

Doctor Data Merger

muhammetakkurtt/doctor-data-merger

This actor is ideal for researchers, healthcare analysts, or anyone needing to combine doctor profiles with patient feedback for comprehensive analysis.

Muhammet Akkurt

"Global Health Data Scraper"

teeming_zitherist/my-actor

Extract structured medical data in seconds. Built for data scientists, researchers, and healthcare professionals. No API dependencies, 100% reliable. Export-ready JSON/CSV output with metadata.

Muhammad Usman Ray Muhammad Usman Ray Last name (optional)

MX Records

samdangerr132/my-actor

SaM DangeR

5.0

Court Records Scraper

consummate_mandala/court-records-scraper

Court Records Scraper. Extract structured data with automatic pagination, proxy rotation, and JSON/CSV export. Pay only for results.

Donny Nguyen