Synthetic Financial Data Generator avatar

Synthetic Financial Data Generator

Pricing

Pay per event

Go to Apify Store
Synthetic Financial Data Generator

Synthetic Financial Data Generator

Generate realistic synthetic financial transaction data with category-aware amounts, temporal spending patterns, running balances, and configurable fraud labels for ML training and fintech testing

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Generate realistic synthetic financial transaction data for ML training, fintech testing, and data pipeline development. Produces bank-statement-quality records with category-aware amounts, temporal spending patterns, running balances, and configurable fraud labels.

What it does

This actor generates synthetic financial transactions that mimic real banking data. No web scraping is involved -- all data is computed locally using statistical models.

Each transaction includes:

  • Account details -- holder name, account type (checking, savings, credit, investment), account ID
  • Transaction data -- amount, date, category, merchant name, MCC code, description
  • Running balance -- accurate per-account balance tracking across all transactions
  • Fraud labels (optional) -- binary fraud flag, fraud type classification, anomaly score

Categories and amount distributions

Transactions are distributed across 12 spending categories with realistic amount ranges:

CategoryRangeDistribution
Groceries$15 -- $250Log-normal (mean $65)
Rent$800 -- $3,500Normal (mean $1,500)
Salary$2,000 -- $8,000Normal (mean $4,500)
Dining$8 -- $120Log-normal (mean $35)
Coffee$3 -- $9Normal (mean $5.50)
Shopping$10 -- $500Log-normal (mean $75)
Transport$2 -- $100Log-normal (mean $25)
Utilities$40 -- $350Normal (mean $150)
Entertainment$5 -- $80Log-normal (mean $25)
Healthcare$15 -- $600Log-normal (mean $120)
Subscriptions$5 -- $50Normal (mean $15)
Transfers$50 -- $2,000Log-normal (mean $500)

Temporal patterns

  • Weekday/weekend bias -- coffee and transport spike on weekdays; dining and entertainment spike on weekends
  • Recurring transactions -- salary deposits (1st and 15th), rent (1st), utilities (15th), subscriptions (variable day)
  • Seasonal multipliers -- spending increases in November (1.15x) and December (1.30x), dips in January (0.85x)
  • Time-of-day realism -- coffee purchases at 6-11 AM, dining at 11 AM-10 PM, salary at 8 AM

Fraud injection

When enabled, a configurable percentage of transactions are flagged as fraudulent with:

  • Fraud types: card_stolen, account_takeover, card_not_present, synthetic_identity
  • Anomaly pattern: fraudulent amounts are 2-8x the normal category maximum
  • Fraud score: 0.7-1.0 for fraudulent transactions, 0.0-0.3 for legitimate ones

Input

FieldTypeDefaultDescription
maxItemsinteger100Number of transactions to generate
numAccountsinteger5Number of unique financial accounts
currencystringUSDCurrency code (USD, EUR, GBP, JPY, CAD, AUD)
dateRangeMonthsinteger6Months of history to generate
fraudRatenumber2Percentage of fraudulent transactions (0-100)
includeFraudLabelsbooleantrueInclude fraud detection fields in output
seedinteger0Random seed for reproducible output

Output

Each transaction record contains:

{
"transaction_id": "397b9202-8ace-4fc4-9fa2-464893c3bc34",
"account_id": "ACCT-0001",
"account_holder": "Brenda Upton",
"account_type": "checking",
"currency": "USD",
"date": "2025-10-03T09:25:27.000Z",
"amount": -65.42,
"type": "debit",
"category": "groceries",
"merchant_name": "Whole Foods",
"merchant_category_code": "5411",
"balance_after": 4231.58,
"is_recurring": false,
"description": "Whole Foods - groceries purchase",
"is_fraudulent": false,
"fraud_type": null,
"fraud_score": 0.12
}

When includeFraudLabels is false, the is_fraudulent, fraud_type, and fraud_score fields are omitted.

Use cases

  • ML model training -- fraud detection, transaction categorization, anomaly detection
  • Fintech testing -- payment processing pipelines, accounting software, budgeting apps
  • Data pipeline development -- ETL workflows, data warehouse testing, API mocking
  • Demo data -- realistic financial dashboards and reports

Reproducibility

Set the seed parameter to any positive integer to get identical output across runs. This is useful for:

  • Consistent test fixtures
  • Reproducible ML training datasets
  • Deterministic integration tests

Performance

  • Sub-second generation for 1,000 transactions
  • 256 MB memory sufficient for up to 50,000 transactions
  • No network requests -- pure computation