Synthetic Financial Data Generator
Pricing
Pay per event
Synthetic Financial Data Generator
Generate realistic synthetic financial transaction data with category-aware amounts, temporal spending patterns, running balances, and configurable fraud labels for ML training and fintech testing
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Generate realistic synthetic financial transaction data for ML training, fintech testing, and data pipeline development. Produces bank-statement-quality records with category-aware amounts, temporal spending patterns, running balances, and configurable fraud labels.
What it does
This actor generates synthetic financial transactions that mimic real banking data. No web scraping is involved -- all data is computed locally using statistical models.
Each transaction includes:
- Account details -- holder name, account type (checking, savings, credit, investment), account ID
- Transaction data -- amount, date, category, merchant name, MCC code, description
- Running balance -- accurate per-account balance tracking across all transactions
- Fraud labels (optional) -- binary fraud flag, fraud type classification, anomaly score
Categories and amount distributions
Transactions are distributed across 12 spending categories with realistic amount ranges:
| Category | Range | Distribution |
|---|---|---|
| Groceries | $15 -- $250 | Log-normal (mean $65) |
| Rent | $800 -- $3,500 | Normal (mean $1,500) |
| Salary | $2,000 -- $8,000 | Normal (mean $4,500) |
| Dining | $8 -- $120 | Log-normal (mean $35) |
| Coffee | $3 -- $9 | Normal (mean $5.50) |
| Shopping | $10 -- $500 | Log-normal (mean $75) |
| Transport | $2 -- $100 | Log-normal (mean $25) |
| Utilities | $40 -- $350 | Normal (mean $150) |
| Entertainment | $5 -- $80 | Log-normal (mean $25) |
| Healthcare | $15 -- $600 | Log-normal (mean $120) |
| Subscriptions | $5 -- $50 | Normal (mean $15) |
| Transfers | $50 -- $2,000 | Log-normal (mean $500) |
Temporal patterns
- Weekday/weekend bias -- coffee and transport spike on weekdays; dining and entertainment spike on weekends
- Recurring transactions -- salary deposits (1st and 15th), rent (1st), utilities (15th), subscriptions (variable day)
- Seasonal multipliers -- spending increases in November (1.15x) and December (1.30x), dips in January (0.85x)
- Time-of-day realism -- coffee purchases at 6-11 AM, dining at 11 AM-10 PM, salary at 8 AM
Fraud injection
When enabled, a configurable percentage of transactions are flagged as fraudulent with:
- Fraud types: card_stolen, account_takeover, card_not_present, synthetic_identity
- Anomaly pattern: fraudulent amounts are 2-8x the normal category maximum
- Fraud score: 0.7-1.0 for fraudulent transactions, 0.0-0.3 for legitimate ones
Input
| Field | Type | Default | Description |
|---|---|---|---|
maxItems | integer | 100 | Number of transactions to generate |
numAccounts | integer | 5 | Number of unique financial accounts |
currency | string | USD | Currency code (USD, EUR, GBP, JPY, CAD, AUD) |
dateRangeMonths | integer | 6 | Months of history to generate |
fraudRate | number | 2 | Percentage of fraudulent transactions (0-100) |
includeFraudLabels | boolean | true | Include fraud detection fields in output |
seed | integer | 0 | Random seed for reproducible output |
Output
Each transaction record contains:
{"transaction_id": "397b9202-8ace-4fc4-9fa2-464893c3bc34","account_id": "ACCT-0001","account_holder": "Brenda Upton","account_type": "checking","currency": "USD","date": "2025-10-03T09:25:27.000Z","amount": -65.42,"type": "debit","category": "groceries","merchant_name": "Whole Foods","merchant_category_code": "5411","balance_after": 4231.58,"is_recurring": false,"description": "Whole Foods - groceries purchase","is_fraudulent": false,"fraud_type": null,"fraud_score": 0.12}
When includeFraudLabels is false, the is_fraudulent, fraud_type, and fraud_score fields are omitted.
Use cases
- ML model training -- fraud detection, transaction categorization, anomaly detection
- Fintech testing -- payment processing pipelines, accounting software, budgeting apps
- Data pipeline development -- ETL workflows, data warehouse testing, API mocking
- Demo data -- realistic financial dashboards and reports
Reproducibility
Set the seed parameter to any positive integer to get identical output across runs. This is useful for:
- Consistent test fixtures
- Reproducible ML training datasets
- Deterministic integration tests
Performance
- Sub-second generation for 1,000 transactions
- 256 MB memory sufficient for up to 50,000 transactions
- No network requests -- pure computation