Quant Infrastructure #5 - Order Executor
A guide to robust order tracking in a trading infrastructure.
In the previous article of the main series, we looked at robustly tracking our trading inventory and built an Inventory component for our Quant Infrastructure.
In this article, we look at tracking and managing orders and build an OrderExecutor for this purpose. Orders require a different approach from the Inventory because we actively control them as opposed to passively taking information from the exchange.
The technique we will show is commonly known as orders-in-flight. The implementation we will build will be simple and robust to errors, both theoretical and those encountered in practice during real trading.
We will address common problems such as duplicate orders and ghost orders and ensure that our order tracking is consistent with our inventory — which we covered in the previous article from the Inventory component’s side.
For consistency, I assume the Inventory component is similar to what we described in the previous part of the series.
Today’s article is free to read. As per usual, the complete source code is attached at the end of the article for paying subscribers.
We will use straightforward Rust to allow readers using other languages to follow and adapt the code to their language of choice relatively easily if they wish.
The article is exchange-agnostic and the techniques shown work on most if not all CEX exchanges such as Binance, Bybit, etc. I have not tested them with any DEXes, however — though I suspect they work there just as well.
The Challenge
Let's define precisely what we're doing and find out what problems we may have on the way to doing it.
We want to be able to place and manage limit orders on the exchange in a way that will be easy for us to reason about and robust to breakage and tracking mistakes. Most exchanges will have two, sometimes three API requests responsible for modifying orders: place, cancel, and (sometimes) modify. We will be updated about the status of our orders via the exchange’s event feed.
Unlike inventory, orders are active, i.e. we control them. There are two problems to solve that arise from this.
In the slit of time that there is a delay between when we send a request via the API and when we are updated about its effect on the order via the event feed, our infrastructure must be able to tell that it has already made a request lest it makes another, identical one.
The opposite problem may also occur: let’s say we send a request to place an order and receive a successful response. Therefore we store the order for tracking as placed, however for one reason or another, the order never reaches the matching engine, and, as a result, is never executed. Because we already recorded it, however, we never place another one though we should.
We will look at how this may happen in a moment — for now the reader should simply know that it happens often enough to require us to address it if we want our system to be fully automated and run continuously.
The above two problems have to do with tracking consistency, our other concern is robustness to errors.
As was the case with the Inventory component shown in the previous article, our tracking mechanism should be able to self-heal, that is, were the tracked state to become invalid for one reason or another, it should be able to automatically recover.
We will write a simple mechanism using orders-in-flight that addresses the above problems. We will address the base case, which is placing, cancelling, and modifying orders.
This three-function interface can be clumsy and bug-prone in trading code so we will also look at an example of a more ergonomic way towards the end of the article.
Originally, I had planned this to be part of the article but had to constrain the scope due to length, so we will only take a brief look. We may cover it in a future article, however.
As usual, our focus is on approximating the problem closely, simplicity, and robustness.
For market orders we can often obtain the result in the same API call as was used to place an order. It may be tempting to do this as it makes it possible to ignore the event feed and many of the above problems altogether.
However, it leads to other problems down the line that (1) are likely to make the system incompatible with limit orders by design; (2) force some kind of synchronization during order requests. I won’t write too much on it other than that I’ve done it and that it is a temptation and a bad idea.
The Exchange
Let's look at the environment we operate in. Much of what is relevant here is the same as in the Inventory article so I will only repeat it here in brief along with some new considerations.
The Exchange runs asynchronously to our infrastructure and we communicate with it via two channels: the API and the event feed.
Both channels involve a degree of latency and are usually reliable but occasionally aren't. This is by way of significant delays, missed events, out-of-order events, etc. I wrote about it at length in the Inventory article and won't repeat it here.
What is new wrt. orders is that the exchange's API and its matching engine are usually separate systems with a communication boundary between them — much like the one between us and the API — that may also experience problems.
We can illustrate the entire configuration like this:
The implications for us are that:
It is impossible to know with certainty the precise state of our order at all times due to the exchange running in parallel with us.
The communications channels may yield faulty information (delayed, skipped events, etc.) which may lead to our tracking being wrong.
In particular, this applies to both the event feed, where it’s obvious, and the API, where, for example, our request succeeds but we fail to read the response, perhaps due to a network error.
Now that we know roughly what problems we may potentially have let’s look at how to address them.
Remember, our goal is to have information that makes it possible to trade reliably — rather than perfect, instantaneous information, which is impossible due to latency.
Implementation
In essence, we will model our orders as ongoing processes rather than static objects, which we may have been tempted to do initially (at least I remember thinking this way my first time). This technique is frequently called orders-in-flight.
Turns out that to trade reliably, it is sufficient to know the expected and last-observed state of the order.
The expected state is in part created by us i.e. when we place, modify, and cancel an order we expect to observe the change soon via the event feed.
First, some prerequisites, our order updates that arrive via the event feed look like the following — they are instantaneous snapshots of order states:
/// An order sent by the exchange's event feed.
pub struct Order {
pub symbol: String,
pub exch_id: String,
pub link_id: String,
pub side: Side,
pub ty: OrderType,
pub price: Option<Decimal>,
pub qty: Decimal,
pub status: OrderStatus,
pub tif: TimeInForce,
pub reduce_only: bool,
pub filled_qty: Option<Decimal>,
pub last_qty: Option<Decimal>,
}
Since I'm writing this article to be exchange-agnostic, this is just a generic type.
We'll define our order-in-flight like so:
#[derive(Debug)]
pub struct OrderInFlight {
pub symbol: String,
pub link_id: String,
pub side: Side,
pub ty: OrderType,
pub tif: TimeInForce,
pub reduce_only: bool,
pub expected: ExpectedOrderState,
pub observed: Option<ObservedOrderState>,
pub mismatch_since: Option<i64>,
}
#[derive(Debug)]
pub struct ExpectedOrderState {
pub price: Option<Decimal>,
pub qty: Decimal,
pub status: ExpectedOrderStatus,
}
#[derive(Debug, PartialEq, Eq)]
pub enum ExpectedOrderStatus {
Alive, // Order is running on the exchange.
Final, // Order has completed.
Pending, // Order has pending changes that have not reached the matching engine.
}
#[derive(Debug)]
pub struct ObservedOrderState {
pub price: Option<Decimal>,
pub qty: Decimal,
pub status: OrderStatus,
}
The expected and observed order states will contain the variable parts of the order i.e. parts that can change either as a result of us making changes or fills (or other in the case of more complex order types than simple limit orders). The variable part that we can modify other than the state is usually just the price and quantity.
Next, we will store the orders in a map, much like we did positions and wallet assets in the Inventory article.
An important bit here is that we should key by an ID that we generate rather than the exchange. Here I call this the link_id (terminology borrowed from Bybit). Most orders have two IDs, one assigned by us and one by the exchange. We use ours because we will need to add to this map when we submit an order but before we observe it on the event feed i.e. before we have the ID chosen by the exchange.
The map looks like this:
pub struct OrderExecutor {
orders: HashMap<String, OrderInFlight>
}
In the attached source code I have one more layer of indirection for symbols, i.e. there are two maps, one keyed by symbol, one by link_id. This is because the link_id can be unique globally or unique within a symbol, depending on the exchange. Here we use a single map for simplicity though this naturally breaks if two orders in two different symbols have the same IDs.
Alternatively, we could use a tuple key of (symbol, link_id).
Next, we will fill this map as necessary, which is:
When we make any changes and expect to see them reflected.
When order information arrives from the exchange.
Let’s do so now, starting with placing orders:
pub fn place_order(&mut self, connector: &mut Connector, order: NewOrder) {
let result = connector.place_order(order);
if is_success_or_indeterminate(result) {
self.orders.insert(
order.symbol,
OrderInFlight {
symbol: order.symbol,
link_id: order.link_id,
expected: OrderState {
price: order.price,
qty: order.qty,
status: ExpectedOrderStatus::Alive,
},
observed: None,
... // and so on
},
);
}
}
Notice the is_success_or_indeterminate() call. As I wrote earlier, two communication links are at play here: the one between our infrastructure and the exchange API, and the one between the exchange API and the matching engine. If any of it breaks, e.g. we failed to read the response from the exchange or received a 500 error indicating that something went wrong on the API’s side, then we may not assume that the order has not been placed. We instead assume it has been and add it to the map even if this may be not what happened. We will add some self-healing capability that will catch those cases and restore our tracking to the correct state soon. This is the best we can do here with the information we have at hand to my knowledge.
We can write our cancel and modify functions analogously:
pub fn cancel_order(&mut self, connector: &mut Connector, order: CancelOrder) {
if let Ok(_) = connector.cancel_order(order) {
if let Some(order_in_flight) = self.orders.get_mut(&order.link_id) {
order_in_flight.expected.status = ExpectedOrderStatus::Final;
}
}
}
pub fn modify_order(&mut self, connector: &mut Connector, order: ModifyOrder) {
if let Ok(_) = connector.modify_order(order) {
if let Some(order_in_flight) = self.orders.get_mut(&order.link_id) {
oif.expected = ExpectedOrderState {
price: Some(order.price),
qty: order.qty,
status: ExpectedOrderStatus::Alive,
};
}
}
}
The observant reader will notice that our functions handle errors internally and do not themselves return errors. I found it useful to not leak errors to the trading logic. The result of taking this approach everywhere is that there are surprisingly few places that ever need to worry about errors — basically only those which communicate with the exchange or any network directly.
Lastly, we can update the map for events from the data feed. We call the following function for each incoming event:
pub fn on_event(&mut self, event: &Event) {
match event {
Event::Order(order) => self.on_order_event(order),
_ => (),
}
}
fn on_order_event(&mut self, order: &Order) {
if order.status.is_final() {
self.orders.remove(&order.link_id);
} else {
self.orders
.entry(order.link_id)
.and_modify(|order_in_flight| {
order_in_flight.observed = Some(ExpectedOrderState {
price: order.price,
qty: order.qty,
status: order.status.into(),
});
})
.or_insert_with(|| OrderInFlight {
symbol: order.symbol,
link_id: order.link_id,
side: order.side,
ty: order.ty,
tif: order.tif,
reduce_only: order.reduce_only,
expected: order.into(),
observed: Some(order.into()),
mismatch_since: None,
});
}
}
This is as far as the tracking part goes. I have written on inventory-orders consistency in the previous article so I won’t repeat it here other than the Order event seen here is the same as in that previous article.
We are now able to place, cancel, and modify orders and can track them in a consistent manner.
We have not yet addressed robustness to errors yet.
Ensuring Robustness
To ensure robustness, we will check that orders match their expected states and should the two drift for too long, we will reset by cancelling all orders and clearing the map.
We should perform this check periodically to not pessimize performance. This is not as relevant live, however, it can significantly hurt performance in backtests. Readers will recognize this from the Inventory article where we used a similar scheme.
Our OrderExecutor will now include a flag to indicate that it is in a bad state and a last check timestamp to implement the check frequency:
pub struct OrderExecutor {
blocks: HashMap<String, Block>,
badflag: bool,
last_check: i64,
}
The checking itself will look like this, the timestamp is passed in explicitly to accommodate simulated time in a backtest:
const TIMEOUT: i64 = 60000; // 1 minute
const CHECK_FREQ: i64 = 1000; // 1 second
pub fn on_event(&mut self, event: &Event) {
...
if timestamp - self.last_check > CHECK_FREQ {
self.check(timestamp);
self.last_check = timestamp;
}
}
}
fn check(&mut self, timestamp: i64) {
let mut desync_since = timestamp;
for block in self.blocks.values_mut() {
for order in block.orders.values_mut() {
if let Some(order_desync_since) = order.mismatch_since {
desync_since = desync_since.min(order_desync_since);
} else if !order.check_expected_and_observed_match() {
order.mismatch_since = Some(timestamp);
}
}
}
if timestamp - desync_since > TIMEOUT {
self.badflag = true;
}
}
Earlier we split order state into alive, final, and pending. This is necessary because we only care about what the order is in the matching engine, not in the API layer. We use the pending state to indicate that the order has reached the API layer but not the matching engine.
To use this in our trading loop we will simply do the following along with other components:
let mut inventory = Inventory::new();
let mut order_executor = OrderExecutor::new();
let ... // etc.
// Trading loop.
while let Some(event) = next_event() {
inventory.on_event(&event);
order_executor.on_event(&event);
if order_executor.is_bad() {
// The orders are bad, cancel all of them and reset executor.
connector.cancel_all_orders();
order_executor = OrderExecutor::new();
}
...
}
With this, we're done. We have built a basic OrderExecutor component that can correctly track our orders, stay in sync with the inventory component (previous article) and is robust to errors.
It may be used directly, or wrapped to be made more ergonomic.
Further Work
Originally I had wanted this article to build on the implementation we just created and cover a more ergonomic way of managing orders since the above three-function interface can be clumsy to work with directly.
As I mentioned before, it’s not covered in the article due to length. To give it a quick look, however:
let mut quoter = Quoter::new();
loop {
let qty = target_inventory - current_inventory;
quoter.place_or_modify_quote("btc/ask", symbol, Ask, price, qty);
// and
quoter.remove_quote("btc/ask");
}
This can simplify order management in strategy code by moving it from stateless to immediate mode. It is in part a gimmick, however.
I leave it to the reader to implement on top of the component we implemented today if he wishes to.
Conclusion & Mini-Announcement
We will finish here.
Thank you for reading, I hope the article was useful to you.
Today’s article was free to read, paying subscribers can download the complete source code in the section below, as usual.
As for the announcement.
Recently, I've been thinking about changing how the series will progress.
The original idea was to cover components one by one in dedicated articles. By that logic, the next would be an InstrumentStore to track traded markets. However, it is basically a map much like what we've covered here already and I don’t want to commit a whole article to it as it would likely not be very interesting.
As such the next topic is at the moment undecided. Some of you have sent me things (either here or via Twitter DMs) which you wished were covered in the series. I will revisit them before picking the next topic (also feel free to submit more, my DMs are open).
Overall, expect more surprises and more interesting and higher-quality articles in the future.
Cheers!
Keep reading with a 7-day free trial
Subscribe to TaiwanQuant's Newsletter to keep reading this post and get 7 days of free access to the full post archives.