Trading Systems Are Data Systems

Posted on Sun 07 June 2026 | Part 6 of Building Real Trading Systems | 18 min read


A trading system is often described through its most visible path: receive market data, generate a signal, submit an order. But production trading infrastructure is larger than the execution path.

A real system also captures market data, tracks state, records decisions, reconciles fills, updates positions, computes risk, monitors health, produces reports, and supports post-trade analysis. Each of those responsibilities depends on data moving through the system with enough structure and context to be trusted later.

The execution layer decides what to do in real time. The data layer makes it possible to understand what happened afterward.

A trading system that only optimizes for speed can still be operationally fragile. Fast execution can create edge. But without data capture, the system becomes harder to debug, improve, and trust under stress.

A production trading system must be able to answer basic questions after the fact:

  • What data was available at decision time?
  • What condition caused the trade?
  • What inputs produced the signal?
  • Did live behavior match research assumptions?
  • Did the system behave correctly under stress?
  • Can the trading session be reconstructed?

Market Data Must Be Captured as the System Saw It

Market data appears straightforward: prices, quantities, and timestamps streaming from exchanges and vendors. The hard part is that the stream is not a clean historical table. It is an operational input delivered over unreliable systems.

Feeds drop. Reconnects create gaps. Vendors disagree on timestamps. Messages can be duplicated, delayed, corrected, or received in an order different from the one the strategy expected.

This matters because a trading system acts on the messages it receives in real time, not on the final historical dataset.

High-fidelity replay depends on preserving the inputs the live system had at decision time: the messages received, their arrival order, their source, their timestamps, gaps, corrections, and any sequencing information available.

If those details are lost, replay stops testing live behavior. A strategy can look correct on cleaned historical data while behaving differently in production because the production system made decisions from delayed, missing, duplicated, or reordered events.


Deterministic Transformations Preserve System Meaning

A trading system rarely acts on raw events directly. It acts on transformed representations of those events.

Ticks become bars, trades become VWAP, order book updates become imbalance measures, fills become slippage, orders become execution-quality metrics, and positions become exposure.

These transformations define how the system interprets market activity, execution behavior, portfolio state, and operational health.

That interpretation must be deterministic: given the same inputs, the same transformation version should produce the same outputs. Otherwise, research, backtesting, replay, monitoring, and reporting can drift apart while appearing to use the same data.

Replayability depends on preserving not only the events, but also the transformations that gave those events meaning.


State Is Data Too

Market data is not the only data required by a trading system. The system also maintains internal state such as strategy state, open orders, pending cancels, fills, positions, account state, risk state, session state, configuration, and operator actions.

Together, these values define the context for processing each event. The same market update can produce different behavior depending on the system's current believed state.

A strategy can fail because its internal state is incorrect. It may maintain local positions that differ from broker state, restart without the context required to interpret the next fill, or compute risk from stale positions.

State bugs are dangerous because the system can keep running while being wrong. It may send orders, write logs, and emit normal-looking metrics, but its local state no longer matches broker/exchange reality. Nothing necessarily crashes. The system simply starts making decisions from a false view of the world.

State transitions should therefore be recorded as data. The system should record not only current state, but also how the state changed: the triggering event, prior state, new state, and component that executed the update. This record explains both the resulting state and the sequence of updates that produced it.

Recorded transition history enables reconstruction. A position snapshot in isolation conveys current value; accompanied by the fills, cancellations, and risk decisions that produced it provides the context required to explain the position and identify divergence.

Trading systems are state machines that operate on uncertain external events. Without recorded state transitions, the system cannot fully explain its behavior.


Identifiers Connect Events Across the System

Reliable reconstruction also depends on stable identifiers. Market events, strategy decisions, order intents, broker orders, fills, accounts, strategy instances, and replay runs need identifiers that survive across services and storage layers. Without them, the system may retain the events but still fail to reconstruct the causal chain from data, to decision, to order, to outcome.


Trading Data Needs a Lifecycle

The lifecycle question is how long each category of data must remain available, at what fidelity, and for which use cases. Current positions, recent fills, observability metrics, historical ticks, reconstructed sessions, and research datasets do not have the same latency, durability, or cost requirements.

  • Hot data: current positions, open orders, recent fills, risk state, and latest market snapshots. This data resides near the execution path and must support low-latency reads and updates.
  • Warm data includes recent sessions, metrics, logs, current-month fills, and short-term observability data. It supports debugging, dashboards, monitoring, and post-trade review and must remain queryable. Elevated warm-data access latency degrades the operational feedback loop.
  • Cold data includes historical tick archives, old order events, daily reports, model features, reconstructed sessions, and long-term research datasets. This data is not commonly accessed during normal operation, but it must remain retrievable for replay, audits, model validation, strategy research, and reprocessing of derived datasets.

These access patterns require tiered retention. Hot data should reside close to execution. Warm data should reside in queryable stores optimized for recent analytical queries. Cold data should reside in durable, cost-efficient storage.

The data lifecycle also determines how data moves through the system. Hot data belongs in streaming paths for live ingestion, state updates, risk checks, monitoring, and alerts. Warm and cold data usually belong in batch-oriented workflows.

For systems that need reproducible research, replay, and long-term validation, historical reprocessing becomes a core capability. Schemas change, calculations change, risk metrics are added, transformation bugs are found, and strategies need longer backtest windows. The system must be able to rebuild derived data from retained raw events.

That requires cold data stored for large reads, clear lineage from raw inputs to derived outputs, and rerunnable transformations that do not duplicate, overwrite, or mix results from different versions.


Storage Should Follow Access Patterns

The storage question is how lifecycle requirements map to concrete systems, query patterns, and cost tradeoffs. At sufficient scale, trading data workloads usually outgrow a single storage engine.

Query Data characteristics Candidate system
What does the strategy require at execution time? Small, hot, mutable runtime state In-memory structures, Redis, service-local cache
What is the system's current view of orders and positions? Transactional, operationally consistent state Postgres, MySQL
What events occurred during this session? Recent, append-heavy, time-indexed events ClickHouse, TimescaleDB, QuestDB, time-series stores
What market data history must be scanned for research? Large, immutable, columnar-scan-optimized Object storage + Parquet
What raw records must be retained for audit and replay? Cold, append-only, write-once archives Object storage, compressed raw files, Parquet, archived logs
What can be queried locally for ad hoc analysis? Local analytical extracts, single-node scale DuckDB, Polars, Pandas
What must be recomputed at distributed scale? Batch-derived features, aggregate metrics, periodic reports Spark, Ray, Databricks-class batch engines
What do dashboards and governed reports query? Curated, schema-versioned, access-controlled outputs Warehouse/lakehouse systems, Snowflake-class engines

Storage design extends beyond engine selection. Partitioning strategy (by timestamp, instrument identifier, account) determines whether queries execute as targeted partition reads or degrade to full scans.

Catalog and metadata systems carry equal weight: as data spans multiple engines and tiers, a catalog that tracks dataset existence, physical location, schema version, lineage, and ownership is a prerequisite for operational reliability and research reproducibility.


Replayability Connects Research and Production

Captured events, deterministic transformations, recorded state transitions, stable identifiers, and retained historical data create the foundation for replay: the ability to reprocess historical events through specific system components under controlled conditions.

In a trading system, replay means historical data can pass through the same decision and state-transition logic used in production, so live behavior can be compared against research behavior.

Replayability also tests the consequences of earlier data-layer decisions. If the system preserved the inputs available at decision time, it can reconstruct past behavior at useful fidelity. If not, replay becomes approximation.

Replay is only as useful as the fidelity of the captured inputs, state, configuration, transformation versions, and external assumptions it can reproduce.

Without replayability, research and production drift apart. Research often operates on curated datasets with simplified assumptions about timing, availability, and execution. Production includes delayed messages, missing fields, retries, partial fills, stale state, process restarts, and rejected orders — operational edge cases that curated datasets rarely represent. Replayability narrows that gap by forcing production-like event streams through the same logic used for research and live operation.


Observability Is a Data Product

Trading system observability is a derived data product. It is not limited to logs, dashboards, alerts, or traces.

Many of the most useful trading metrics cannot be computed correctly from a single component in isolation: feed latency, order latency, fill quality, rejection rate, stale-data exposure, missing-data windows, strategy health, risk-limit utilization, per-strategy drawdown, and inter-strategy correlation. They require interpreting events across market-data handlers, strategy processes, order routers, execution venues, risk engines, and portfolio-accounting systems.

Each metric needs a formal definition. What qualifies as a rejected order? Which timestamps define order latency? Are acknowledgements, partial fills, cancels, and replaces included? How is stale data detected? How are positions attributed to strategies? Which events are included in drawdown calculations, and which are excluded?

Without these definitions, observability drifts toward presentation. A dashboard may show elevated latency, but incident analysis needs to trace that value back to source events. It may show unexpected exposure, but investigation requires the fills, position updates, strategy IDs, account IDs, risk checks, and routing decisions that produced it.

Observability data therefore needs the same controls as any other trading data product: reliable timestamps, stable identifiers, explicit schemas, reproducible transformations, and clear ownership of semantics.

The dashboard is only the final surface. The product is the event model underneath it.


A trading system needs low-latency execution, but speed is only one requirement. The system also needs memory.

It must preserve the events that define its behavior: market data, strategy decisions, risk checks, orders, broker responses, fills, rejects, state transitions, and outcomes. These records make it possible to reconstruct decisions, investigate incidents, measure performance, compare live behavior against research, and understand whether the system behaved as intended.

Execution infrastructure determines what the system can do in real time. Data infrastructure determines what can be understood after the fact.

Without execution, trades cannot be placed. Without retained data, behavior cannot be reconstructed, evaluated, or trusted. A trading system is therefore both an execution system and a data system.

Note: AI tools are used for drafting and editing. All technical reasoning, system design, and conclusions are human-driven.