Quant Infrastructure Series #2 - Exchange Client
A comprehensive guide to building a robust exchange client.
In today’s article, we build a robust API client for a CEX/hybrid exchange with the dual purpose of downloading historical data from it and — in the long term — trading. The method shown here is applicable across exchanges. We will use Binance Futures as an example.
This topic is a natural place to begin the series. As Quants, we need historical data to research and backtest ideas and exchanges expose large amounts of it via their APIs. Third-party clients often either do not exist or are subpar in one way or another.
The article is envisioned as a template/how-to guide for building an in-house client for exchanges that expose an HTTP API. This includes all common CEX and hybrid exchanges: Binance, Bybit, OKX, Kucoin, dYdX, etc.
For practical reasons, we seek to convey the idea rather than include every single line of code of the completed client.
The reader will be happy to learn that exchange APIs are very similar to each other such that being able to integrate Binance Futures means we are also able to integrate Bybit Spot and so on.
The article demonstrates how to write a client that is simple, understandable, robust to errors, and which we are able to trust, update, and fix when necessary. We think these properties are necessary if we want to use the client to trade real money, and if we value our time and want to stay maximally headache-free as Quants — at least on the infrastructure side.
We also lay the groundwork for the next article, which will show a performant storage system, which is the natural progression after a client used to download data.
In the long term, we plan to use the client to allow us to trade.
The complete source code in Rust will be available at the end of the article for paying subscribers.
For completeness, other ways to acquire data include:
Buy it from a vendor.
Record exchange’s live feeds. This method is rarely useful outside of HFT as the data needed usually spans years and so recording it is impractical for us.
Some exchanges will also make their data available for download outside the API. For example, here’s Binance’s public data catalogue released recently.
Disclaimer: Trading is a risky endeavour and I am not responsible for any losses you may incur implementing ideas learnt through these articles.
Disclaimer: This is not financial advice.
Overview
We will begin by implementing a single request to fetch basic information about the exchange: time, traded contracts, etc. In the article, we will call this our base-case client.
We will then expand this base-case client one step at a time to add missing functionality and make it robust to errors.
In total, we will implement:
The base case (i.e. make any simple request)
Authentication
Rate limiting
Retry logic
Handle errors
As the reader will find out, exchanges expose lots of requests but he will only need a few specific ones to download data and trade. In the article we limit ourselves to one request and show how to add more, we expect the reader to do so himself. This is not difficult though it is repetitive and clerical.
For our exchange client, our main concern is to directly map the API requests that we’re interested in into callable functions.
Doing so directly rather than designing an ergonomic wrapper (as some might be tempted to do) will help keep our code straightforward and useful irrespective of how or for what we use afterwards. By proxy, it also makes our work focused and shorter. In this case, boring tends to mean good.
In retrospect, the reader may view this as us applying the separation-of-concerns software design principle. The author — being a contrarian in this respect — tends think principles such as this tend to characterize good design design after the software has been written rather than produce it.
We will conclude the general overview here and look at the aforementioned base case next.
Link to Binance Futures API documentation: here.
Implementation
To demonstrate what we have in mind specifically when we say client, let’s look at some usage code first. Our client will have the following interface:
// main.rs
let client = Binance::new(BinanceCfg {
auth: Some(BinanceAuth {
api_key: ...,
api_secret: ...,
}),
..BinanceCfg::default()
})?;
let exch_info = client.get_exchange_info()?;
info!("Exchange info: {}", exch_info);
let account_info = client.get_account_info()?;
info!("Account info: {}", account_info);
let klines = client.get_klines(GetKlines {
symbol: "BTCUSDT",
interval: "1d", // 1 day
start_time: Some(now() - days(90)),
end_time: None,
limit: 1000,
})?;
info!("Klines: {}", klines);
If the reader compares this to the API documentation linked earlier he will find that we simply took API endpoints (/exchangeInfo, /account and /klines) and turned them into functions. This is basically what a client is.
Okay, let’s look at the implementation for our aforementioned base case.
We use the ureq crate to make HTTP requests. Most languages will have an analogous library, if the reader is writing Python, he may use the requests package from pip.
use std::time::Duration;
struct BinanceCfg {
pub timeout: Option<Duration>, // Defaults to 10 seconds
pub max_idle_connections: Option<usize>, // Defaults to 20
}
pub struct BinanceClient {
agent: ureq::Agent,
}
impl BinanceClient {
pub fn new(cfg: BinanceCfg) -> Self {
let timeout = cfg.timeout.unwrap_or(Duration::from_secs(10));
let max_idle_connections = max_idle_connections.unwrap_or(20);
let agent = ureq::AgentBuilder::new()
.timeout(timeout)
.no_delay(true)
.https_only(true)
.max_idle_connections(max_idle_connections)
.max_idle_connections_per_host(max_idle_connections)
.build();
Self { agent }
}
}
There are several things to address in the above code:
The timeout is required because some requests take much longer than anticipated up to the point of hanging forever. This will happen in practice not just in theory so we make sure to prepare. It is important that the HTTP library we use implements the timeout on the underlying TCP socket directly. Most do and I mention this for completeness because I have lost money on hanging TCP sockets in the past.
The no_delay() function disables Nagle's algorithm on the underlying TCP connection. This should improve our latency a little at the cost of throughput. We prefer latency to throughput so we turn it off. For MFT, this is unlikely to alter our PNL and the reader should view it as a probable, low-effort optimization and is free to ignore it.
We also set max_connections. Most HTTP libraries will maintain several TCP connections to the host (exchange) to avoid establishing a new TCP connection every time we make a request. We like this for two reasons:
Establishing a new connection is meaningfully slow as it requires several roundtrips to the host.
A single connection is limited to one request at a time.
In practice, I recommend we set the limit to equal the maximum number of request-making threads. As we don’t have any at the moment, we just use a ballpark, large-enough number.
A brief note on technology. The communication with the exchange proceeds along an onion-like layering of protocols:
\(IP ← TCP ← HTTP\)Most of the time, we care only about the top HTTP level. Rarely, such as with the timeout, inner layers of the onion can cause trouble for us. It is rare but we should be aware of it.
Ok, we’ve finished this part. Our next task is to map the endpoints we care about to functions we can call in our programming language of choice.
Mapping an API Request - Base Case
For example, suppose we want to call GET /v1/exchangeInfo. We first look at what from the returned data interests us and define a type to hold it. In this case, the documentation states the response will be a JSON document that looks roughly like this:
{
serverTime: number,
assets: [...a list of assets...],
symbols: [...a list of symbols...],
timezone: string
}
We want to map this into our own type. If we use a dynamically typed language such as Python, we may be able to skip this. In Rust using the serde library for deserialization we write:
// types.rs
use serde::Deserialize;
#[derive(Debug, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct ExchangeInfo {
pub server_time: i64,
pub assets: Vec<Asset>,
pub symbols: Vec<Symbol>,
pub timezone: String,
// ...more fields here
}
#[derive(Debug, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct Asset {
pub asset: String,
// ...more fields here
}
#[derive(Debug, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct Symbol {
pub symbol: String,
// ...more fields here
}
I skip most of the above structures for brevity as they are somewhat long and would extend this code block far more than reasonable.
Next, we write a function that makes our request and deserializes the response. We will modify it in a short moment to include retry logic and error parsing but for now, our base case will look something like this:
We use the anyhow crate for the Result type for convenience.
// client.rs
const BASE_URL: &str = "https://fapi.binance.com";
impl BinanceClient {
pub fn get_exchange_info(&self) -> Result<ExchangeInfo> {
let url = format!("{}/fapi/v1/exchangeInfo", BASE_URL);
let req = self.agent.get(&url);
let res = req.call()?;
let res = res.into_json()?;
Ok(res)
}
}
At this point, we should be able to fetch exchange information from the API. The reader is encouraged to later use this as basis to integrate other endpoints, such as the one responsible for fetching historical candlesticks: GET /v1/klines.
To summarize, the process is:
Map the response data types to our programming language of choice.
Write the request function.
As we can see this is mostly clerical work. In dynamic languages such as Python step 1 may be unnecessary as there are no strong types.
Next, we extend our base-case client by rate limiting, retry logic, and authentication.
Rate-Limiting
Rate limits are how exchanges keep the quantity of requests bounded to keep their systems from being overwhelmed. These limits are usually high enough so as to never be reached when trading normally.
However, when downloading data, we want to do so as quickly as possible. Basically, we will spam requests at the exchange. Breaching the rate limits will result in the exchange banning us for a period of time. On Binance, the bans increase in length as we continue to breach the rate limit.
Bybit is more aggressive and, according to its docs, will ban us permanently on the first breach. This looks excessive and the author has never been banned himself on Bybit so can’t confirm if this is true or not. The reader should be careful however.
Typically, the rate limit is set up as:
and as:
We will ignore the latter for now and focus on the former.
The reader may have noticed in the response returned from get_exchange_info() that Binance’s futures API enforces three limits:
2400 request weight / minute
1200 orders / minute
300 orders / second
Should we breach one of those limits we will get a 429 Too Many Requests response on the first few violations, followed by 418 IP Banned if we continue to disobey.
The former limit will require us to wait until our limit resets (<= 1 minute) and is okay to occasionally cross, but the latter will ban our IP address for exponentially prolonged intervals. We must avoid it and should never receive it if our system works correctly.
We will approach handling rate limits as follows:
Find out what the exchange’s rate limits are (already done).
Simulate the rate limits locally and either hold back or fail requests that would breach it, depending on the need (wait when downloading; fail when trading).
As the simulated limit is not perfect, respect 429 Too Many Requests and 418 IP Banned responses when received from the exchange.
On Binance we might also see ourselves blocked by Cloudflare, as Binance uses it in front of its API. This is easy to see as the request response is simply a website telling you that you’re going to fast. The solution is to simply wait 10 seconds and then resume downloading.
We handle this below as a byproduct of handling 429 Too Many Requests.
To simulate the rate limit locally we use a rate-limiting algorithm. There are several options. We will use a queue variant of the leaky bucket algorithm for simplicity. The implementation is as follows:
// leaky_bucket.rs
struct LeakyBucket {
send: SyncSender<()>,
}
impl LeakyBucket {
pub fn new(capacity: usize, refill_size: usize, refill_freq: Duration) -> Self {
let (send, recv) = sync_channel(capacity);
spawn(|| {
loop {
sleep(refill_freq);
if recv.recv().is_err() { break; } // Disconnected
for _ in 0..refill_size-1 { recv.try_recv().ok(); }
}
});
Self { send }
}
// We use this when we want to make a request now or fail
// immediately.
pub fn request(&self, weight: usize) -> Result<()> {
for _ in 0..weight { self.try_send(())?; }
Ok(())
}
// We use this when we want to wait for a request to become
// permitted.
pub fn wait(&self, weight: usize) -> Result<()> {
for _ in 0..weight { self.send(())?; }
Ok(())
}
}
The way this works is that when the channel/queue reaches its maximum capacity (our rate limit), the call to send() will block until there is space available in the channel. A refill thread running in parallel will remove refill_size elements at the desired refill_freq frequency from the channel thus giving us the desired throughput (equal to the rate limit).
Dear reader, I originally implemented the mechanism shown here in mid-2021 as I was figuring out how to do this part well. It has worked well since then the present day (May 2023). As such, I never bothered to improve it.
While writing this article it occured to me that there are ways/improvements that could be made to better track the exchange’s rate limit. I have not spent the time to research them. I reassure the reader that even though the method shown here may not be optimial, it is (1) very simple and (2) adequate for solving the problem.
Next, let’s incorporate Binance’s 2400 request weight/minute limit by adding the leaky bucket to our BinanceClient and asking for the request’s request weight before making the request. We can see in the documentation that the weight used by this request is 1:
// client.rs
struct BinanceClient { limiter: LeakyBucket, ... };
impl BinanceClient {
pub fn new() -> Self {
Self {
limiter: LeakyBucket::new(2400, 2400, Duration::secs(60)),
...
}
}
pub fn get_exchange_info(&self) -> Result<ExchangeInfo> {
let url = format!("{}/fapi/v1/exchangeInfo", BASE_URL, path);
let req = self.agent.get(&url);
// This call will fail if the simulated rate limit is exceeded.
self.limiter.request(1)?;
let res = req.call()?;
let res = res.into_json()?;
Ok(res)
}
}
Great, now we have a local simulation of Binance’s rate limit.
To complete the rate-limiting code we still need to handle 429 Too Many Requests and 418 IP Banned responses. Let’s do that now.
Since this also has to do with rate limiting but is unrelated to the leaky bucket, we will make a RateLimit struct to wrap the above and use it instead of LeakyBucket:
// rate_limit.rs
use chrono::prelude::*;
use leaky_bucket::LeakyBucket;
use std::{
sync::atomic::{AtomicI64, Ordering},
thread::sleep,
}
pub struct RateLimit {
request_weight_1m: LeakyBucket,
retry_after: i64,
}
impl RateLimit {
pub fn new() -> Result<Self> {
Ok(Self {
request_weight_1m: LeakyBucket::new(2400, 2400, Duration::secs(60)),
retry_after: 0,
})
}
pub fn request(&self, weight: usize) -> Result<()> {
let wait = self.get_retry_after() - Utc::now().timestamp_millis();
if wait > 0 {
sleep(Duration::from_millis(wait as u64));
}
self.request_weight_1m.wait(weight);
Ok(())
}
pub fn set_retry_after(&self, value: i64) {
self.retry_after.fetch_max(value, Ordering::AcqRel);
}
pub fn get_retry_after(&self) -> i64 {
self.retry_after.load(Ordering::Acquire)
}
}
The above code has a Rust-specific quirk. We use an atomic u64 to allow us to make self immutable and thus to allow us to use the same RateLimiter from multiple threads. Readers who are unfamiliar with this may think of the retry_after variable as a regular integer behind a lock for synchronization across threads.
The reader should keep in mind that when downloading data he should split the work over multiple threads. This is because a single thread is unlikely to saturate the rate limit and will make the process take much longer.
On the other hand, too many threads/requests happening at once may still breach the rate limit despite the precautions we take.
The author uses 8 threads per exchange. The reader may copy this or experiment.
To update the retry_after variable to a suitable value, we can use the following function after each request:
// client.rs
fn update_rate_limit(limiter: &RateLimit, res: &Response) -> Result<()> {
// Handle 429 Too Many Requests and 418 IP Banned.
if matches!(res.status(), 429 | 418) {
// The Retry-After header is optional; when missing we use a default.
let wait = match res.header("Retry-After") {
Some(retry_after) => retry_after.parse()?,
None => 10, // 10 seconds is empirically sufficient.
};
let now = Utc::now().timestamp_millis();
let timestamp = now + wait * 1000; // Retry-After value is in seconds.
limiter.set_retry_after(timestamp);
}
Ok(())
}
And with this, dear reader, we are done.
For completeness, the reasons for the occasional failure of our simulated rate limit are:
We make no attempt to reset our rate limit at the same time as Binance resets its.
Our leaky bucket requests rate limit allowence in a streaming fashion one unit at a time i.e. weight of 10 equals 10 times weight of 1, which is not accurate.
Retry Logic
In the previous section we looked at avoiding failure, now we will look at handling it.
As a general truth, errors will happen regardless of precautions and especially when dealing with networks. We anticipate this and implement a retry logic.
As a matter of robustness, we must handle all errors. This does not mean all errors in particular — that is, for example — we can group them into categories and process them in bulk or otherwise use various error handling strategies to simplify this task.
We may address error handling strategies in a future article. Anyway, back to the retry logic.
A straightforward way of implementing a retry logic is to write a function that takes another function — one we wish to retry on — and calls it repeatedly. If the result is a success then we simply return it, otherwise, we check if we should retry and either do so or return an error.
As we are dealing with a network service, adding a short and increasing delay between attempts is advisable. This delay is often called backoff delay.
// retry.rs
pub struct RetrySchedule<'a> {
pub n_times: usize,
pub delays: &'a [f64],
}
pub fn retry<T>(schedule: RetrySchedule, func: impl Fn() -> Result<T>) -> Result<T> {
let mut n = 0;
loop {
match func() {
Ok(res) => return Ok(res),
Err(err) => {
if !should_retry(err) {
return Err(err);
}
n += 1;
if n > schedule.n_times {
return err;
} else {
let index = (n - 1).min(schedule.delays.len() - 1);
let seconds = schedule.delays[index];
if seconds > 0.0 {
sleep(Duration::from_secs_f64(seconds));
}
}
}
}
}
}
The should_retry() function will take the error and determine if it should be retried or not. A simple strategy is to only retry if the error is either:
An HTTP error with status in the 4XX range, which indicates a bad request except for 429 and 418, which are handled by the rate limiter.
A connection error.
fn should_retry(error: &anyhow::Error) -> bool {
for cause in error.chain() {
if let Some(ureq::Error::Status(&status, _)) = cause.downcast_ref() {
if status / 100 != 4 || status != 429 && status != 418 {
return true;
}
} else if let Some(ureq::Error::Transport(_)) = cause.downcast_ref() {
return true;
}
}
false
}
Finally, let’s amend our example function:
// client.rs
pub fn get_exchange_info(&self) -> Result<ExchangeInfo> {
let url = format!("{}/fapi/v1/exchangeInfo", BASE_URL, path);
retry(self.retry_schedule, || {
let req = self.agent.get(&url);
self.limiter.request(1)?;
let res = req.call()?;
update_rate_limit(&self.limiter, &res)?;
let res = res.into_json()?;
Ok(res)
})
}
And we’re finished.
Now to the commentary:
As a basic model, we can group requests into ones we wish to retry and ones we do not. We can group errors in the same way: as either transient (retriable) or persistent (non-retriable).
Retriable requests include fetching historical data for storage and exclude placing an order (in cases when it is time-sensitive and we do not wish to accidentally place two orders instead of one).
Retriable errors include connection errors and exclude 400 Bad Request errors.
We should retry only if both the request and the error are retriable.
This concludes our model.
As a general principle, the user of our code — that is, the programmer calling our functions — has more information about what they want to do — and thus what the retry behaviour should be — than we do.
On the other hand, we can’t really leave writing the retry logic to the user every time they want to make a request as this would have terrible ergonomics.
The reader may conceptualize this as a kind of bivariate optimization to maximize power and ergonomics, which are in this case negatively correlated.
In our example function, we have implemented a fixed retry schedule and rate-limiting behaviour. This is fine, as we want to use it only to download data. However, it will be a bad idea to use this same behaviour during trading.
So, what do we do?
A robust solution is to parametrize requests based on the intended behaviour, for example:
enum BlockingBehaviour { Wait, Fail }
fn get_exchange_info(&self, bb: BlockingBehaviour);
We leave it to the reader to implement this.
Dear reader, the source code attached at the end of the article expands on this by specifying a BlockingBehavior { Wait, Fail } with each request. This is omitted from the article as I added it to the source code afterwards. I mention it to the reader so that he may implement it himself if he wishes to.
Authentication
The final component to address before we can say our client is feature complete is authentication.
Binance uses HMAC signatures with SHA256 as the hash function. This is a very popular method used with minor alterations by Bybit, OKX, Kucoin and others (in fact, it may be used by all the major exchanges as I can’t think of one that doesn’t do it off the top of my head).
In simple terms, to make an authenticated request, we will take our API keys and use them to sign each authenticated request we make. When we do so, upon receiving the request, Binance will be able to verify that it has been created using our API keys.
The reader can generate his keys in the API keys section of his Binance account.
Signing a request under the HMAC scheme is a three-step process:
Concatenate the request arguments, timestamp, etc. into a string.
Use HMAC with the concatenated string and our API secret key to generate a signature string. This signature string signs our requests.
Include the signature string with the request.
The Rust code for this is somewhat quirky so I thought it would be helpful to also write this in pseudocode:
signature = hmac(api_secret, string(request.query));
request.query['signature'] = signature;
request.headers['X-MBX-APIKEY'] = api_key;
The exact process is described in the Binance API docs here and the Rust code to perform it is provided below (using the hmac, sha2 and url crates).
fn sign(req: Request, auth: &BinanceAuth, timestamp: i64) -> Result<Request> {
let mut req = req.query("timestamp", ×tamp.to_string());
let signature = signature(&req, &auth.api_secret)?;
req = req.query("signature", &signature);
req = req.set("X-MBX-APIKEY", &auth.api_key);
Ok(req)
}
fn signature(req: &Request, api_secret: &str) -> Result<String> {
use hmac::*;
use sha2::Sha256;
use url::Url;
type HmacSha256 = Hmac<Sha256>;
let url = Url::parse(req.url())?;
let query = url.query().unwrap_or_default();
let mut mac = HmacSha256::new_from_slice(api_secret.as_bytes()).map_err(|e| anyhow!(e))?;
mac.update(query.as_bytes());
Ok(hex::encode(mac.finalize().into_bytes()))
}
With this, dear reader, you have all the tools necessary to write your own API client for Binance or any other exchange.
Handling JSON in languages without fancy serialization libraries
While Rust has serde and other languages such as Python and JavaScript have dynamic types and do not require deserialization to a static type at all, the matter of deserialization is more complicated in languages that have neither dynamic types nor a macro system able to generate deserialization code, such as C++.
For those languages, we show a simple way to recursively deserialize JSON into structs:
// json_example.cpp
struct ExchangeInfo {
int exchange_time;
std::vector<Symbol> symbols;
};
ExchangeInfo deserialize_exchange_info(JsonValue &json) {
return (ExchangeInfo){
.exchange_time = json["exchange_time"].to_int(),
.symbols = deserialize_vector(json["symbols"], deserialize_symbol),
};
}
Symbol deserialize_symbol(JsonValue &json) {
return (Symbol){...};
}
Conclusion
We conclude the article here. We have shown how to implement an exchange client and hope that we leave the reader with an adequate understanding of the process.
We leave it to the reader to continue the implementation and fill in detail as necessary or to implement his own client for his exchange-language combination of choice.
The complete Rust source code is attached below for paying subscribers. It is expanded compared to the article by a few requests, error deserialization, and a few other things which we omitted here.
To celebrate this release and the start of the newsletter, early readers will be able to subscribe for a permanent 20% discount for the next 7 days.
If you find this content helpful and wish to read more of it sooner, please consider sharing the newsletter. This gives me a direct indication of reader interest and lets me know to publish more articles like this one.
In the next article, we will look at a fast storage system for candlesticks. I attach a benchmark of this storage system below as a teaser.
Benchmarking all BTC pairs
---------------------------
Time : 30.02 seconds
Klines : 603173651
Throughput : 1686.01 MB/s
Klines/s : 20090077.99 klines/s
It is already partially written and I intend to release it at max(next_week, when_ready) — or in other words — next week subject to the time needed to make the quality worthy.
It will be an odd-numbered article and so limited to paying subscribers, with the article after it being again free and so on.
That is it for today!
I invite the reader to leave any comments or questions he may have below or tag me on Twitter.
As before, I hope reading this was worth your time.
Cheers!
Reflections
As the publication is new, I attach some paragraphs to reflect on today and to convey my future intentions to the reader.
The article took considerably longer to prepare than I aimed for — two weeks instead of one — and this is with some less important parts cut out and much of the rest compressed to keep the length tractable. I will continue to calibrate in the future so this should get better.
Somewhat relatedly, it seems to me that some of the material from today would be better covered in separate articles at comfortable length and density instead of being packed into a single long article such as this. Future articles may be much more bite-sized and come out more often for this reason.
Keep reading with a 7-day free trial
Subscribe to TaiwanQuant's Newsletter to keep reading this post and get 7 days of free access to the full post archives.