Generation Engines
Five generation engines covering statistical synthesis, physics-based simulation, constrained generation, and industrial security. Engine selection is fully automatic — the platform routes every job to the correct engine based on the sealed job specification and user intent.
Engine Overview
| Engine | Compute | Trigger | Use Case |
|---|---|---|---|
| Mock Engine | CPU | Description-only requests with no training data | Instant synthetic data from natural language descriptions |
| Synthesize (AI) | CPU/GPU | Training data uploaded or referenced | High-fidelity tabular synthesis via our trained generative model |
| Constrained Synthesis | CPU/GPU | Hard domain constraints specified in the sealed job spec | Constraint-safe sampling — violations impossible by construction |
| Virtual SCADA | CPU | SCADA/OT telemetry or plant simulation requested | Physics-honest industrial OT telemetry with six live protocols |
| ICS Security | CPU | Attack dataset or cyber-range scenario requested | MITRE ATT&CK ICS attack datasets with ground-truth labels |
ℹAutomatic engine selection
Mock Engine
Instant data generation from natural language descriptions alone. No training data required. The Mock Engine uses a deterministic, seed-reproducible row generator seeded from the sealed job spec, with correlation-preserving lift that enforces cross-column correlations, foreign key integrity, and nullable patterns.
✦When Mock Engine activates
Synthesize (trained AI engine)
The primary trained synthesis engine. Our generative model achieves industry-leading accuracy on public tabular benchmarks. It learns the joint distribution from uploaded training data — marginals, correlations, and tails — and generates statistically faithful synthetic records.
Path Selection
| Path | Status | Description |
|---|---|---|
| Trained AI | Primary | Distribution-aware training with multi-stage refinement for faithful joint structure |
| Statistical fallback | Fallback | Classical statistical synthesis for datasets under roughly one hundred rows |
| Auto-Selector | Active | Routes between the trained and statistical paths based on dataset size and shape |
Performance
Generates hundreds of thousands of rows in seconds on CPU, with fidelity measured per run and reported in the utility artefact of the evidence bundle.
Constrained Synthesis
Constraint-safe generation for tabular data that must obey hard domain rules. The platform automatically routes to this engine when the sealed job spec includes domain constraints that require guaranteed satisfaction.
- Violations impossible by construction: the sampler respects constraints at every step
- Constraint types: box bounds, rate limits, monotonicity, equality/inequality sum constraints
- Purpose-built for financial, clinical, and actuarial data that must obey business rules
- Cross-column physics relationships (e.g., pressure–temperature coupling) supported
⚠When Constrained Synthesis activates
Virtual SCADA Simulator
Physics-honest OT telemetry simulation with six live industrial protocol stacks. A layered composition architecture covers a deep library of plant templates across the full breadth of critical-infrastructure sectors.
See Virtual SCADA for full protocol and plant documentation.
ICS Security Simulator
Ground-truth labelled ICS attack datasets mapped to MITRE ATT&CK ICS techniques. Generates reconnaissance, credential access, lateral movement, privilege escalation, command and control, and impact traffic with full causality chains. Can wrap a live Virtual SCADA simulation for integrated attack scenarios.
See ICS Security for attack categories and technique mappings.
Supporting Components
| Component | Role |
|---|---|
| AI Orchestrator | Translates natural language into a sealed job spec and routes to the correct engine |
| Agentic Data Scientist | Autonomous multi-step pipelines spanning any combination of engines |
See AI Orchestrator and Agentic Data Scientist for full documentation.