Synthetic E-Commerce Data Generator avatar

Synthetic E-Commerce Data Generator

Pricing

Pay per event

Go to Apify Store
Synthetic E-Commerce Data Generator

Synthetic E-Commerce Data Generator

Generate realistic e-commerce test data with interconnected products, customers, orders, and reviews. Features referential integrity, realistic distributions, temporal coherence, industry presets, and deterministic seed mode.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Generate realistic e-commerce test datasets with four interconnected entity types: products, customers, orders, and reviews. All entities maintain referential integrity — orders reference real product and customer IDs, reviews reference real products and customers. Timestamps maintain temporal coherence: orders are placed after customer registration, shipments follow order placement, deliveries follow shipment.

Features

  • Four entity types with cross-references: products, customers, orders, reviews
  • Referential integrity — every order and review links to real product and customer IDs generated in the same run
  • Realistic statistical distributions: log-normal product prices, right-skewed review ratings (average ~4.2), weighted order statuses (70% delivered)
  • Temporal coherence — shipped_at always follows ordered_at, delivered_at follows shipped_at, orders are placed after customer registration
  • Five industry presets with tailored categories, brand names, and price ranges
  • Deterministic seed mode for reproducible datasets
  • Five locale options for names, addresses, and phone numbers
  • No network calls, no proxy needed — pure CPU data generation

Who Uses Synthetic E-Commerce Data and Why

  • E-commerce developers — populate Shopify, WooCommerce, or Magento staging environments with realistic test data before launch
  • Data engineers — validate ETL pipelines with known-schema e-commerce records that include edge cases (zero-order customers, cancelled orders, one-star reviews)
  • Analytics teams — build and demo dashboards with realistic order volumes, customer segments, and product catalogs without exposing production data
  • QA engineers — stress-test order processing systems with thousands of orders referencing real product inventories and customer accounts
  • Bootcamp instructors — provide students with clean, well-structured datasets for SQL exercises, pandas workshops, and data visualization projects

How It Works

  1. You configure how many products, customers, orders, and reviews to generate, pick an industry preset and locale, and optionally set a random seed.
  2. The generator creates products first (with industry-specific categories, brands, and log-normal price distributions), then customers (with segment-weighted lifetime values), then orders (referencing real products and customers, with calculated totals and temporal timestamps), then reviews (with rating-appropriate text templates and referential links).
  3. In unified mode, all entities go to one dataset with an entityType field. In separate mode, products go to the dataset and other entities are saved as JSON in the key-value store.
  4. The maxItems cap is applied after generation to limit total output size.

Input

Default run — 100 mixed records

{
"numProducts": 20,
"numCustomers": 30,
"numOrders": 50,
"numReviews": 40,
"maxItems": 100,
"industry": "general",
"locale": "en",
"outputFormat": "unified"
}

Electronics dataset with deterministic seed

{
"numProducts": 50,
"numCustomers": 100,
"numOrders": 200,
"numReviews": 150,
"maxItems": 0,
"industry": "electronics",
"seed": 42,
"outputFormat": "unified"
}

Fashion products only (separate mode)

{
"numProducts": 100,
"numCustomers": 50,
"numOrders": 80,
"numReviews": 60,
"maxItems": 0,
"industry": "fashion",
"outputFormat": "separate"
}

Input Reference

FieldTypeDefaultDescription
numProductsinteger20Number of product records to generate (1–10,000)
numCustomersinteger30Number of customer records to generate (1–50,000)
numOrdersinteger50Number of order records to generate (0–100,000)
numReviewsinteger40Number of review records to generate (0–100,000)
maxItemsinteger100Maximum total records across all entity types. Set to 0 for no limit
industrystringgeneralIndustry preset: general, fashion, electronics, grocery, home_goods
localestringenLocale for names and addresses: en, de, fr, ja, es
seedintegernullRandom seed for deterministic output. Omit for random data each run
outputFormatstringunifiedunified puts all entities in one dataset. separate puts only products in the dataset and saves the rest to the key-value store

Output

Product record

{
"entityType": "product",
"product_id": "PROD-00001",
"product_name": "Premium Laptops X7K",
"sku": "SKU-RSJ7NHY5",
"brand": "NovaTech",
"category": "Electronics",
"subcategory": "Laptops",
"price": 54.56,
"cost": 34.63,
"weight_kg": 5.68,
"rating_avg": 4.2,
"review_count": 140,
"in_stock": true,
"created_at": "2024-03-23T03:33:06.557Z"
}

Customer record

{
"entityType": "customer",
"customer_id": "CUST-00001",
"first_name": "Bonita",
"last_name": "Tremblay",
"email": "bonita.tremblay@hotmail.com",
"phone": "(983) 829-9005",
"address": "5836 E Main Street",
"city": "Flagstaff",
"state": "VT",
"zip": "75793-8196",
"country": "US",
"customer_created_at": "2024-03-28T15:21:19.313Z",
"lifetime_value": 2450.75,
"order_count": 12,
"segment": "returning"
}

Order record

{
"entityType": "order",
"order_id": "ORD-00001",
"order_customer_id": "CUST-00017",
"product_ids": "PROD-00007, PROD-00013, PROD-00002",
"quantities": "1, 2, 1",
"subtotal": 326.41,
"tax": 28.97,
"shipping": 0,
"total": 355.38,
"order_status": "delivered",
"ordered_at": "2025-06-22T10:21:51.493Z",
"shipped_at": "2025-06-26T10:21:51.493Z",
"delivered_at": "2025-06-28T10:21:51.493Z"
}

Review record

{
"entityType": "review",
"review_id": "REV-00001",
"review_product_id": "PROD-00003",
"review_customer_id": "CUST-00012",
"review_rating": 5,
"review_title": "Love it!",
"review_body": "Absolutely love this Premium Tablets A3M! The build quality is outstanding. Would definitely buy again.",
"helpful_count": 7,
"verified_purchase": true,
"reviewed_at": "2025-08-15T14:30:22.100Z"
}

Industry Presets

PresetCategoriesPrice RangeExample Brands
generalElectronics, Clothing, Home & Kitchen, Sports, Books$5–$500Apex, NovaTech, Zenith
fashionWomen's Clothing, Men's Clothing, Shoes, Accessories, Sportswear$15–$800Luxe & Co, Urban Thread, Maison Noir
electronicsComputers, Mobile, Audio, Smart Home, Gaming$10–$2,500TechVault, PixelForge, Quantum
groceryFresh Produce, Dairy & Eggs, Bakery, Beverages, Pantry$1–$50Green Valley, Harvest Moon, Farm Fresh
home_goodsFurniture, Decor, Kitchen, Bedding, Garden$8–$1,200HomeStead, Craftwell, Willow & Oak

Performance

This actor generates data in-memory with no network calls. Approximate run times:

  • 100 records: < 1 second
  • 1,000 records: 1–2 seconds
  • 10,000 records: 5–10 seconds
  • 100,000 records: 30–60 seconds

Memory usage stays under 256MB for datasets up to 100,000 records.

Need More Features?

If you need additional entity types (inventory, shipping carriers, promotions), custom field mappings, or integration with specific e-commerce platforms, file an issue or get in touch. We are always open to extending the generator to suit your needs.