Infrastructure·2026-06-08·10 min read·← all posts

Algorithmic trade execution latency on Binance — measuring and reducing the gap

A backtest assumes you fill at the exact price your signal saw. Live trading does not work that way. The gap between the signal firing and the order actually filling on the exchange is where retail algorithms quietly lose 0.5-2% per trade — enough to convert a profitable backtest into a losing live strategy. The gap is measurable, and most of it is fixable without exotic infrastructure.

What latency actually is

Total execution latency on a typical retail crypto algorithm is the sum of four components:

Detection latency — time from when the market condition becomes true to when your code recognises it.
Decision latency — time spent computing whether to act on the detection.
Network latency — time for your order to travel from your server to Binance's matching engine.
Exchange latency — time Binance takes to acknowledge, queue, and execute the order.

End-to-end on a typical retail cloud VM, these add up to 600–1500 milliseconds. End-to-end at an institutional desk colocated next to Binance: 5–15 milliseconds. Two orders of magnitude difference. Retail can't close most of that gap, but it can close enough of it to make systematic strategies viable.

Detection latency — where most retail loses

This is the gap from "the market condition became true" to "your code knows about it." Most retail strategies poll the REST API every few seconds for data, which means your minimum detection latency is your polling interval — typically 1000 to 5000 ms.

The fix: WebSocket streams. Binance publishes real-time WebSocket feeds for kline, aggTrade, depth, mark price, and forceOrders. Subscribing to the WebSocket means your code sees the same data Binance does, in real time, with no polling delay.

Concretely: a kline WebSocket subscription delivers the closed 1-minute candle within 100 ms of close. A REST poll of the same data, even on a fast loop, will be 1500–3000 ms behind. Switching from REST polling to WebSocket reduces detection latency by an order of magnitude.

For aggregate-trade-based detection, the WebSocket delivers each trade as it executes — sub-100ms from match. This is the cleanest signal source for any short-horizon strategy.

Decision latency — easy to win

This is the gap between data arrival and order dispatch. Most retail decision logic is simple: check a few thresholds, compute position size, send the order. End-to-end well under 10 ms if the code is reasonably written.

Where decision latency creeps up:

Database I/O to look up historical state. If you query MongoDB or PostgreSQL on every signal, you add 5-50 ms.
External HTTP calls inside the decision path. Anything that pulls fresh data from another API mid-decision adds at least the round-trip time.
Garbage collection pauses in interpreted runtimes. Node.js or Python can pause 50-200 ms during GC under load.

Fix: maintain in-memory state, pre-fetch dependent data before the trigger event, minimize allocations inside the decision path. Most retail algos waste 20-50 ms here that could be 1-2 ms.

Network latency — geography matters

Binance's primary futures matching engine is in AWS Tokyo (ap-northeast-1). Network latency from there is fixed by the speed of light plus routing inefficiency.

Typical retail round-trip times from VPS providers:

AWS Tokyo: 1-3 ms (best possible without colocation)
AWS Singapore: 60-90 ms
AWS Frankfurt: 220-260 ms
AWS US-East: 160-200 ms
Hetzner Helsinki: 250-300 ms

If your bot runs on a VPS in Frankfurt and Binance is in Tokyo, you have a permanent 250 ms round-trip latency floor. No code optimization can close that gap.

The biggest single latency improvement most retail algos can make is moving the bot to AWS Tokyo (ap-northeast-1). A $20/month t4g.small in Tokyo gives you the same network latency to Binance as a $20,000/month colocated server, give or take a few milliseconds. The remaining gap is in the exchange's internal queue, which money can't buy.

Exchange latency — what you can't control

From order arrival at Binance's edge to order acknowledgment is typically 5-20 ms. From order acknowledgment to actual execution (fill or rest) is another 1-50 ms depending on order type. This is largely outside your control but it has implications.

The implication: Binance's queue is not FIFO across all users. Higher-tier accounts (VIP-1, VIP-2, etc.) get faster execution lanes. If you are an unverified retail user, your orders queue behind VIP user orders during heavy traffic. There's no way to skip this except by trading enough volume to reach VIP-1 yourself (around $50M/month).

This matters for time-sensitive strategies. latency-sensitive event-driven strategies, where trade quality depends on being faster than the next-tier slow user, can be sensitive to user-tier execution lanes. For longer-horizon strategies (our directional and mean-reversion models), the few hundred milliseconds difference doesn't matter much.

Measuring your own latency

Instrument every step. For each order:

t1 — timestamp of the market event (kline close, aggTrade fill, etc.). From the WebSocket message timestamp.
t2 — timestamp when your code received the message. From your local clock.
t3 — timestamp when your code dispatched the order. From your local clock just before the HTTP send.
t4 — timestamp Binance acknowledged the order. From the order response.
t5 — timestamp Binance filled the order. From the fill notification.

Delta t2−t1 is your network/WebSocket latency from Binance to you. Delta t3−t2 is your decision latency. Delta t4−t3 is your order send + Binance ack latency. Delta t5−t4 is the time spent in Binance's matching engine. Log all five for every trade. After a week, you have a clear picture of where the latency lives.

Most retail bots that have never measured this find their largest latency contributor is something they didn't expect — usually decision latency from a database query they forgot was in the hot path.

What we actually achieve internally

Our infrastructure runs in Hetzner Frankfurt (we are EU-based; the latency cost is the price of EU jurisdiction). Typical end-to-end latency from signal trigger to Binance order ACK on the firm trade book: 280-340 ms.

Decomposition: 0 ms detection (WebSocket); ~5 ms decision; ~250 ms network round-trip; ~25 ms Binance ACK. The 250 ms network is the price of running in EU instead of Tokyo. Moving infrastructure to Tokyo would shave that to about 20 ms total round-trip.

For latency-sensitive event-driven strategies — where speed is the edge — this is genuinely too slow. We've been migrating that specific strategy to Tokyo. For our models and our models, where the edge is the analytical pre-trade work, not the execution speed, EU infrastructure is fine.

The honest takeaway

Latency optimization has diminishing returns. The first easy wins (WebSocket subscription, Tokyo VPS) get you 80% of what's possible without serious investment. The remaining 20% costs orders of magnitude more — colocated servers, kernel bypass networking, FPGA matching — and is only worth it for strategies where speed is the edge.

Most retail algorithms can get to "good enough" execution with $50/month in infrastructure and a few days of work. The strategies that need more either have specific time-sensitivity (and should invest accordingly) or have an edge that doesn't depend on speed in the first place — in which case the latency obsession is wasted effort.

Skip the infrastructure setup

Our Pro feed runs on optimized infrastructure with sub-second end-to-end latency on most strategies. You connect your Binance API; we send the signal; the trade is placed on your account. Trial is free.

Start free trial →