Synthea - Create Synthetic FHIR Compliant Health Records avatar
Synthea - Create Synthetic FHIR Compliant Health Records

Pricing

Pay per event

Go to Apify Store
Synthea - Create Synthetic FHIR Compliant Health Records

Synthea - Create Synthetic FHIR Compliant Health Records

Create realistic synthetic patient healthcare data without privacy concerns. Generates FHIR R4 bundles, CSV files, and comprehensive patient records with demographics, conditions, medications, and procedures. Ideal for EHR testing, healthcare development, medical research, and machine learning.

Pricing

Pay per event

Rating

5.0

(4)

Developer

John

John

Maintained by Community

Actor stats

3

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

🏥 Synthea Synthetic Patient Data Generator

Generate realistic synthetic patient healthcare data using Synthea. Create FHIR bundles, CSV files, and metadata for synthetic patients with demographics, conditions, medications, procedures, and more. Perfect for healthcare research, EHR testing, medical training, and healthcare application development.

💡 What is Synthea Synthetic Patient Data Generator?

Synthea Synthetic Patient Data Generator is a powerful Apify Actor that generates realistic, synthetic patient healthcare data using the open-source Synthea™ synthetic patient population simulator. This Actor transforms complex healthcare data generation into a simple, scalable API that produces comprehensive patient records in industry-standard formats.

Whether you're building healthcare applications, testing EHR systems, conducting medical research, training machine learning models, or developing healthcare analytics tools, you'll get realistic, privacy-compliant patient data without the legal and ethical concerns of using real patient information ⚡.

FHIR-Compliant: Generates data in FHIR (Fast Healthcare Interoperability Resources) format, the industry standard for healthcare data exchange

Comprehensive Data: Includes patient demographics, conditions, medications, procedures, observations, encounters, and more

Privacy-Safe: Synthetic data eliminates HIPAA concerns and privacy risks associated with real patient data

Reproducible: Use seeds to generate the same patient populations for consistent testing and development


📦 What Data Can You Extract?

🏷️ Data Type📋 Description
👤 Patient DemographicsAge, gender, race, ethnicity, address, and other demographic information
🏥 FHIR BundlesComplete FHIR R4 bundles containing all patient healthcare data in standardized format
💊 MedicationsPrescribed medications with dosages, frequencies, and administration details
🩺 ConditionsDiagnosed medical conditions with onset dates, severity, and clinical status
🔬 ProceduresMedical procedures performed with dates, codes, and outcomes
📊 ObservationsLab results, vital signs, and other clinical observations
🏥 EncountersHealthcare encounters including hospitalizations, office visits, and emergency visits
📋 CSV FilesTabular data exports for easy analysis in spreadsheet applications
📄 MetadataGeneration metadata including patient counts, execution parameters, and file listings
🔗 US Core IG SupportOptional US Core R4 Implementation Guide profiles for enhanced interoperability

This structured synthetic patient dataset can be exported for EHR testing, healthcare application development, medical research, machine learning training, and healthcare analytics workflows.


⚙️ Key Features

Realistic Synthetic Data — Generates medically accurate patient records with realistic relationships between conditions, medications, and procedures

🔬 FHIR R4 Compliant — Produces industry-standard FHIR bundles ready for integration with FHIR-compliant systems

🎯 Advanced Filtering — Filter patients by gender, age range, location (state/city), and other demographic characteristics

🌍 Location-Based Generation — Generate patients from specific US states and cities for geographic analysis

📊 Multiple Output Formats — Get data in FHIR JSON bundles, CSV files, and structured metadata

🔄 Reproducible Results — Use seeds to generate identical patient populations for consistent testing

💰 Pay-Per-Patient Pricing — Transparent pricing: setup fee + per-patient charge (only pay for patients you generate)

🛡️ Production-Ready — Built-in error handling, validation, and graceful failure modes

📦 Structured Output — Clean JSON output with complete metadata and file organization

⚙️ Customizable Configuration — Support for custom Synthea configuration files and modules

🇺🇸 US Core IG Support — Optional US Core R4 Implementation Guide profiles for enhanced FHIR compliance

💾 Key-Value Store Storage — All FHIR bundles and CSV data are stored in Key-Value Store with references in the dataset, ensuring compliance with size limits while preserving all data


📖 Usage Examples

Example 1: Basic Patient Generation

Generate a single synthetic patient with default settings.

{
"population_size": 1
}

Output: One patient record with complete FHIR bundle, CSV files, and metadata.


Example 2: Generate Multiple Patients with Location

Generate 10 patients from Massachusetts.

{
"population_size": 10,
"state": "Massachusetts"
}

Output: 10 patient records from Massachusetts with location-specific healthcare data.


Example 3: Filter by Gender and Age Range

Generate 5 female patients aged 30-40 years.

{
"population_size": 5,
"gender": "F",
"age_range": "30-40"
}

Output: 5 female patients between 30-40 years old with age and gender-appropriate conditions and medications.


Example 4: Reproducible Generation with Seed

Generate the same 3 patients every time using a seed for reproducibility.

{
"population_size": 3,
"seed": 12345
}

Output: Identical 3 patient records every time this seed is used, perfect for testing and development.


Example 5: Location-Specific Generation (State and City)

Generate patients from a specific city in California.

{
"population_size": 5,
"state": "California",
"city": "San Francisco"
}

Output: 5 patient records from San Francisco, California with location-specific healthcare encounters.


Example 6: Custom Reference Date

Generate patients with a specific reference date for historical scenarios.

{
"population_size": 10,
"reference_date": "20200101",
"state": "New York"
}

Output: 10 patients from New York with healthcare data generated relative to January 1, 2020.


Example 7: US Core Implementation Guide

Generate FHIR bundles using US Core R4 Implementation Guide profiles.

{
"population_size": 5,
"exporter_fhir_use_us_core_ig": true,
"state": "Texas"
}

Output: 5 patients from Texas with FHIR bundles conforming to US Core R4 Implementation Guide profiles.


Example 8: Comprehensive Configuration

Use multiple filters and options together for precise patient generation.

{
"population_size": 20,
"seed": 67890,
"gender": "M",
"age_range": "25-50",
"state": "Florida",
"city": "Miami",
"exporter_fhir_use_us_core_ig": true,
"output_file": "miami_patients.json"
}

Output: 20 male patients aged 25-50 from Miami, Florida with US Core IG-compliant FHIR bundles, saved to a local file.


For large populations (>100 patients), generate in smaller batches to avoid disk space limitations.

Batch 1:

{
"population_size": 250,
"seed": 1000,
"state": "California"
}

Batch 2:

{
"population_size": 250,
"seed": 2000,
"state": "California"
}

Output: 100 total patients from California, generated in two batches. Each batch is automatically cleaned up after storage, preventing disk space issues.

💡 Tip: Use different seeds for each batch to ensure unique patients, or use sequential seeds to maintain reproducibility.


🔍 Input Parameters

ParameterTypeRequiredDefaultDescription
population_sizeinteger✅ Yes1Number of synthetic patients to generate. Must be at least 1. Each patient record is charged separately. ⚠️ For large populations (>100 patients): Consider generating in smaller batches (200-300 patients per run) to avoid disk space limitations. Files are automatically cleaned up after storage in Key-Value Store.
seedinteger❌ No-Random seed for reproducible patient generation. If provided, the same seed will generate the same patients. Useful for testing and consistent data generation.
reference_datestring❌ NoToday's dateReference date for patient generation in YYYYMMDD format (e.g., "20240101"). If not provided, uses today's date. All patient timelines are generated relative to this date.
clinician_seedinteger❌ No-Seed for clinician assignment to patients. If provided, ensures consistent clinician assignments across runs. Useful for maintaining consistent provider-patient relationships.
genderstring❌ No-Filter patients by gender. Only patients of the specified gender will be generated. See Gender Options below.
age_rangestring❌ No-Filter patients by age range in format "minAge-maxAge" (e.g., "30-40"). Only patients within this age range will be generated. Ages must be integers.
statestring❌ No-US state name for patient location (e.g., "Massachusetts", "California"). If specified, only patients from this state will be generated. Use full state names, not abbreviations.
citystring❌ No-City name for patient location. Requires state to be specified. If provided, only patients from this city will be generated.
config_pathstring❌ No-Path to a local Synthea configuration file. If not provided, uses default Synthea configuration. Advanced users can customize Synthea behavior with custom config files.
modules_dirstring❌ No-Path to a local Synthea modules directory. If not provided, uses default Synthea modules. Advanced users can use custom modules for specialized patient generation scenarios.
exporter_fhir_use_us_core_igboolean❌ NofalseIf enabled (true), generates FHIR bundles using US Core R4 Implementation Guide profiles. Enhances FHIR compliance for US healthcare systems. See US Core IG Options below.
output_dirstring❌ NoDefault locationBase directory where Synthea will create its output. Synthea creates an output/ subdirectory here. If not provided, uses default location.
output_filestring❌ No-Optional filename to save results as JSON file locally. If not provided, results are only pushed to Apify dataset. Useful for local development and testing.

⚠️ Limitations & Important Notes

Disk Space Considerations

Large Population Generation: When generating large populations (>100 patients), disk space may become a limiting factor. Each patient generates FHIR JSON files that can be 0.5-1 MB in size. For 1000 patients, this can require 100+ MB of temporary disk space during generation.

Solutions:

  • Generate in batches: For large populations, split into multiple runs of 200-300 patients each
  • Automatic cleanup: All output files are automatically deleted from disk after being stored in Apify's Key-Value Store, preventing accumulation
  • Monitor space: If you encounter "No space left on device" errors, reduce the population_size and run multiple smaller batches

Note: Files are temporarily stored on disk during generation, then automatically cleaned up after successful storage in Key-Value Store. This ensures all data is preserved while managing disk space efficiently.


📊 Enumerated Value Options

Gender Options

The gender parameter accepts the following values:

ValueDescriptionWhen to Use
"M"MaleGenerate only male patients. Useful for gender-specific research, testing, or analysis.
"F"FemaleGenerate only female patients. Useful for gender-specific research, testing, or analysis.

Note: If gender is not specified, patients of all genders will be generated according to natural distribution.


US Core Implementation Guide Options

The exporter_fhir_use_us_core_ig parameter accepts the following values:

ValueDescriptionWhen to Use
trueEnable US Core R4 IGUse when you need FHIR bundles that conform to US Core R4 Implementation Guide profiles. Required for integration with many US healthcare systems and EHR platforms.
falseStandard FHIR R4Use standard FHIR R4 format (default). Suitable for most use cases and international healthcare systems.

Note: US Core IG profiles provide additional constraints and extensions specific to US healthcare data standards, enhancing interoperability with US-based systems.


📤 Output Format

Dataset Structure

Each generation run returns a structured JSON object with the following structure:

{
"generation_timestamp": "2025-01-15T14:30:00.123456",
"patient_count": 5,
"execution_metadata": {
"population_size": 5,
"seed": 12345,
"reference_date": "20240101",
"clinician_seed": null,
"gender": "F",
"age_range": "30-40",
"state": "Massachusetts",
"city": null,
"base_directory": "/path/to/output",
"output_directory": "/path/to/output/output",
"execution_timestamp": "2025-01-15T14:25:00.000000",
"command": "java -jar synthea-with-dependencies.jar -s 12345 -p 5 -g F -a 30-40 Massachusetts"
},
"generated_files": {
"fhir": [
"/path/to/output/output/fhir/Patient_12345.json",
"/path/to/output/output/fhir/Patient_67890.json"
],
"csv": [
"/path/to/output/output/csv/patients.csv",
"/path/to/output/output/csv/conditions.csv",
"/path/to/output/output/csv/medications.csv"
],
"metadata": [
"/path/to/output/output/metadata/metadata.json"
]
},
"data": {
"fhir_bundles": [
{
"file": "Patient_12345.json",
"path": "/path/to/output/output/fhir/Patient_12345.json",
"key_value_store_key": "fhir-bundle-Patient_12345.json",
"size_mb": 5.2,
"stored_in_kv_store": true,
"format": "json"
},
{
"file": "Patient_67890.json",
"path": "/path/to/output/output/fhir/Patient_67890.json",
"key_value_store_key": "fhir-bundle-Patient_67890.json",
"size_mb": 12.5,
"stored_in_kv_store": true,
"format": "json"
}
],
"csv_data": {
"patients.csv": {
"file": "patients.csv",
"path": "/path/to/output/output/csv/patients.csv",
"key_value_store_key": "csv-patients",
"size_mb": 2.1,
"stored_in_kv_store": true,
"row_count": 100
}
},
"metadata": {
"metadata.json": {
"file": "metadata.json",
"path": "/path/to/output/output/metadata/metadata.json",
"data": {
"patientCount": 5
}
}
}
},
"summary": {
"total_fhir_files": 5,
"total_csv_files": 10,
"total_metadata_files": 1,
"patient_count": 5
}
}

Output Fields

Top-Level Fields

  • generation_timestamp: ISO 8601 timestamp when the patient generation was completed
  • patient_count: Number of patient records generated (used for billing)
  • execution_metadata: Complete metadata about the Synthea execution including all parameters used
  • generated_files: Lists of generated files organized by type (FHIR, CSV, metadata)
  • data: Parsed data from all generated files, ready for use
  • summary: Summary statistics of the generation run

FHIR Bundle Structure

All FHIR bundles are stored in Key-Value Store with references in the dataset:

{
"file": "Patient_12345.json",
"path": "/path/to/output/output/fhir/Patient_12345.json",
"key_value_store_key": "fhir-bundle-Patient_12345.json",
"size_mb": 5.2,
"stored_in_kv_store": true,
"format": "json"
}

Each FHIR bundle (whether in dataset or Key-Value Store) contains:

  • resourceType: Always "Bundle" for FHIR bundles
  • type: Bundle type (typically "collection")
  • entry: Array of FHIR resources including:
    • Patient: Demographics and basic information
    • Condition: Diagnosed medical conditions
    • MedicationRequest: Prescribed medications
    • Procedure: Medical procedures performed
    • Observation: Lab results and vital signs
    • Encounter: Healthcare encounters (visits, hospitalizations)
    • Organization: Healthcare organizations (hospitals, clinics)
    • Practitioner: Healthcare providers
    • And more healthcare resources

Retrieving FHIR Bundles from Key-Value Store

All FHIR bundles are stored in Key-Value Store. Retrieve the full data using the key_value_store_key:

Using Apify SDK (Python):

from apify import Actor
async with Actor:
# Get dataset item
dataset_item = await Actor.get_dataset_items()
# For each FHIR bundle (all are stored in Key-Value Store)
for bundle in dataset_item[0]['data']['fhir_bundles']:
# Retrieve from Key-Value Store
full_bundle_data = await Actor.get_value(bundle['key_value_store_key'])
print(f"Retrieved bundle: {full_bundle_data}")
# For each CSV file (all are stored in Key-Value Store)
for filename, csv_ref in dataset_item[0]['data']['csv_data'].items():
# Retrieve from Key-Value Store
full_csv_data = await Actor.get_value(csv_ref['key_value_store_key'])
print(f"Retrieved CSV {filename}: {full_csv_data}")

Using Apify API:

# Get the key_value_store_key from dataset item
KEY="fhir-bundle-Patient_67890.json"
STORE_ID="your-store-id"
# Retrieve from Key-Value Store
curl "https://api.apify.com/v2/key-value-stores/${STORE_ID}/records/${KEY}" \
-H "Authorization: Bearer ${APIFY_TOKEN}"

Note: All FHIR bundles and CSV data are stored in Key-Value Store to ensure compliance with Apify's 9MB dataset item limit. The dataset contains only references and metadata, while the full data is accessible via Key-Value Store.

CSV Data Structure

All CSV files are stored in Key-Value Store with references in the dataset:

{
"patients.csv": {
"file": "patients.csv",
"path": "/path/to/output/output/csv/patients.csv",
"key_value_store_key": "csv-patients",
"size_mb": 2.1,
"stored_in_kv_store": true,
"row_count": 100
}
}

CSV files are organized by data type:

  • patients.csv: Basic patient demographics
  • conditions.csv: Patient conditions and diagnoses
  • medications.csv: Prescribed medications
  • procedures.csv: Medical procedures
  • observations.csv: Lab results and observations
  • encounters.csv: Healthcare encounters
  • And more specialized CSV files

Each CSV file entry includes:

  • file: Filename
  • path: Full file path
  • key_value_store_key: Key to retrieve full CSV data from Key-Value Store
  • row_count: Number of rows in the file
  • size_mb: Size of the CSV data in megabytes
  • stored_in_kv_store: Always true

To retrieve CSV data, use the same method as FHIR bundles - get the key_value_store_key and fetch from Key-Value Store.


💰 Pricing

This Actor uses a pay-per-event pricing model with transparent pricing:

  • Setup Fee: per Actor run (one-time charge for instance setup and initialization)
  • Patient Record: per patient record generated (charged for each synthetic patient record returned in the results)

You only pay for patients you actually receive, making it cost-effective for both small-scale testing and large-scale data generation. The Actor performs a pre-run credit check to ensure you have sufficient funds. If insufficient funds are detected, the Actor will exit gracefully with an error message.

Note: System files (hospitals, practitioners) are included at no additional charge beyond the setup fee.


🎯 Use Cases

  • EHR Testing & Development: Generate realistic patient data for testing Electronic Health Record systems without privacy concerns

  • Healthcare Application Development: Create test datasets for healthcare applications, mobile health apps, and telemedicine platforms

  • Medical Research: Use synthetic data for research when real patient data is unavailable or requires extensive IRB approval

  • Machine Learning Training: Train healthcare AI models on realistic synthetic data without privacy restrictions

  • Healthcare Analytics: Analyze patient populations, treatment patterns, and healthcare outcomes using synthetic data

  • FHIR System Integration: Test FHIR-compliant systems and APIs with realistic patient bundles

  • Medical Education: Create patient scenarios for medical training and education programs

  • Healthcare Data Modeling: Model healthcare workflows, care pathways, and treatment protocols

  • Compliance Testing: Test healthcare systems for HIPAA compliance and data privacy requirements

  • Population Health Analysis: Study population health trends, disease prevalence, and healthcare utilization patterns

  • Clinical Decision Support: Develop and test clinical decision support systems with realistic patient scenarios

  • Healthcare Interoperability: Test healthcare data exchange and interoperability between systems


🚀 Ready to Generate Synthetic Patient Data?

Start using Synthea Synthetic Patient Data Generator today and create realistic, privacy-compliant patient healthcare data in minutes! Whether you're building healthcare applications, testing EHR systems, conducting medical research, or training AI models, you'll have comprehensive FHIR-compliant patient data ready for immediate use.

Made with ❤️

Transform your healthcare development and research with reliable, production-ready synthetic patient data generation. Generate realistic patient records without privacy concerns or legal restrictions.


Last Updated: 2025.12.08