November 10, 2025 Dr. Rohan Mehta

Breaking Down the Latency Budget in a Real-Time BCI Pipeline

By Dr. Rohan Mehta, Founder & CEO · Synaptiq

One hundred milliseconds is cited frequently as the target latency ceiling for motor BCI feedback. The number is grounded in motor neuroscience research on proprioceptive feedback integration — delays beyond roughly 100ms begin to be perceptible as a decoupling between intended and observed motion, which degrades the user's sense of agency and, consequently, the neurorehabilitation feedback loop that makes BCI-assisted therapy effective.

What rarely gets published in detail is where those 100ms actually go. The BCI literature reports end-to-end system latency numbers, but the breakdown — how much is attributable to each subsystem — matters enormously for engineering decisions. Choosing a trial window length, selecting a USB interface vs. dedicated serial connection, deciding whether to run preprocessing on the EEG amplifier's embedded processor or on the host compute platform: all of these decisions affect specific components of the latency budget, and making them well requires knowing the budget structure before you design the system.

What follows is an account of where the milliseconds go in a realistic clinical BCI pipeline, based on the hardware and software architecture we use in the Synaptiq SDK. The numbers are realistic ranges, not guarantees — your specific hardware combination will have different characteristics, and the only way to know your actual numbers is to measure them.

The Structural Latency: EEG Acquisition Windows

The most significant and least-discussed latency component is structural: you cannot classify a motor imagery trial until you have collected enough EEG data to make a reliable classification. For a temporal window-based classifier (the dominant approach for mu/beta ERD), this window is typically 500ms–1000ms. This window length does not appear in any single component's latency measurement because it is a sampling constraint, not a processing delay — but it is the largest single contributor to end-to-end latency.

A 500ms classification window means the minimum possible latency from motor imagery onset to classification output is 500ms, regardless of how fast all the other pipeline components run. In practice, motor imagery onset itself precedes cue appearance by a few hundred milliseconds for well-practiced imagers, and the system can begin collecting the window before the "official" imagery onset if a cue-based protocol is used. But for uncued, continuous motor imagery decoding — the relevant scenario for rehabilitation exoskeleton control — the 500ms window is a hard latency floor.

This is why reducing the classification window is an active area of research: 250ms windows have been demonstrated for strong imagers at some accuracy cost, and 100ms windows are theoretically achievable for patients with very clean signals and practiced imagery. In current clinical deployments, 500ms is the practical minimum for acceptable classification accuracy across a typical patient population. Accepting a 500ms structural delay and optimizing all other components to add minimally to it is the engineering goal — not trying to eliminate the window entirely.

EEG Amplifier Buffer Latency: 8–20ms

EEG amplifiers digitize the analog scalp signal at a fixed sampling rate and transfer samples to the host computer in buffers — groups of samples rather than one sample at a time. The buffer size is a trade-off: larger buffers reduce USB transfer frequency and are more efficient for the bus, but add latency (you wait until the buffer is full before any samples arrive). Smaller buffers reduce latency but increase USB overhead.

At 500 Hz sampling rate, a common buffer size of 8–10 samples (16–20ms) balances latency and transfer efficiency. Some amplifiers allow buffer sizes down to 1–2 samples (2–4ms at 500 Hz) at the cost of significantly increased USB interrupt load and potential dropped packets on loaded systems. For clinical hardware where reliability matters more than absolute minimum latency, 8-sample buffers at 500 Hz are the practical baseline — contributing approximately 16ms to the budget.

At 250 Hz sampling rate (some lower-cost clinical headsets), a 4-sample buffer gives 16ms. At 1000 Hz sampling rate (research-grade amplifiers), an 8-sample buffer gives 8ms. Higher sampling rates with proportional buffer sizes produce similar absolute latencies — the tradeoff is that higher sample rates increase computational cost for the same number of samples-per-classification-window.

USB Transfer: 4–8ms typical, 20ms worst-case

USB 2.0 Full Speed (12 Mbit/s) is the dominant interface for clinical EEG amplifiers — it is available on essentially all clinical hardware, supports the bandwidth required for 32-channel 500 Hz streaming (approximately 1 Mbit/s including protocol overhead), and provides adequate electrical isolation via transformer coupling in the USB cable.

USB transfers have a latency floor set by the USB host controller's polling interval. For full-speed bulk transfers, the polling interval is 1ms (one USB microframe). In practice, latency from transfer request to data receipt is 1–3ms under low system load. Under higher load — when the USB host controller is servicing multiple devices, including the exoskeleton control interface — worst-case latency can reach 15–20ms.

Dedicated USB host controller allocation (a separate USB controller for the EEG amplifier, not sharing with storage or network devices) reduces worst-case latency. On embedded compute platforms (Raspberry Pi class hardware sometimes used in research setups), USB timing variability is substantially higher than on PC-class hardware with XHCI controllers. If your deployment platform uses embedded compute, characterize USB latency under representative load before committing to it.

Software Preprocessing: 3–8ms

Preprocessing for a 32-channel, 500 Hz EEG system within a 500ms window involves:

FIR bandpass filter application (8–30 Hz, order ~100 for 500 Hz sample rate): ~1–2ms with NumPy/SciPy FFT-based convolution
CAR or Laplacian re-referencing: ~0.5ms
Artifact detection and channel interpolation (if implemented): ~1–2ms
Covariance matrix estimation (32×32, regularized): ~0.5–1ms

Total preprocessing time for a single trial on modern hardware (Intel Core i5 or equivalent, Python with NumPy): approximately 3–8ms depending on filter order and artifact rejection complexity. The covariance estimation is fast because the matrix dimensions are small (32×32) — even Ledoit-Wolf regularized estimation runs in under 1ms for this size.

Python with NumPy is adequate for these dimensions. C++ implementations save perhaps 1–2ms per trial but the difference matters primarily at higher channel counts (64+) or very high sample rates (2000+ Hz) — at the clinical range of 16–32 channels and 250–500 Hz, the Python overhead is not the bottleneck.

Classification: 1–3ms

Riemannian geometry classification (MDM or tangent space projection + LDA) for a 32-channel EEG trial at inference time:

Tangent space projection (mapping covariance matrix to tangent space at Riemannian mean): ~1–2ms
LDA decision function evaluation: ~0.1ms

The tangent space projection requires computing the matrix logarithm (log(M^-1/2 C M^-1/2)), which involves a matrix eigendecomposition. For a 32×32 matrix, eigendecomposition takes approximately 0.5–1ms on modern hardware using LAPACK routines (as called by NumPy). The full tangent space projection including matrix square root computation is approximately 1–2ms per trial.

Filter bank approaches (applying CSP and covariance estimation in multiple frequency bands before classification) multiply this by the number of bands — 6 bands × 1ms = 6ms additional. Whether this is acceptable depends on the overall latency budget and the accuracy benefit observed for the specific patient population.

Command Encoding and Transmission: 1–3ms

Encoding the classification output as a command message and transmitting it to the exoskeleton control interface:

Serial (RS-232 at 115200 baud): 1ms per 11-byte command frame
UDP over localhost: ~0.1ms
GPIO assertion (for trigger-based interfaces): ~0.01ms

Serial transmission latency is dominated by the serial frame duration and the UART driver scheduling latency. At 115200 baud, a 12-byte command frame (start byte + command code + parameters + checksum) takes approximately 1ms to transmit. This can be reduced by using higher baud rates (460800 baud gives ~0.25ms per frame) if the exoskeleton's serial interface supports it.

Exoskeleton Firmware Response: 10–30ms

The final latency component is in the exoskeleton's own firmware: the time from command receipt to actuator initiation. This varies by exoskeleton architecture, processor speed, and control loop frequency. Research-grade exoskeletons with fast control loops (500 Hz–1kHz) have firmware latencies in the 2–5ms range. Clinical exoskeletons with slower control loops (50–100 Hz) have firmware latencies of 10–20ms from command receipt to the next control loop execution that processes the command.

This is the component that device manufacturers know best about their own hardware and that is most difficult for BCI software teams to control. It must be specified in the interface contract and measured during integration testing.

Putting It Together

A realistic latency budget for a 32-channel, 500 Hz clinical BCI pipeline with a 500ms classification window:

Structural window latency: 500ms
EEG buffer latency: ~16ms
USB transfer: ~8ms
Preprocessing: ~5ms
Classification: ~2ms
Command transmission: ~1ms
Exoskeleton firmware response: ~15ms
Total: ~547ms

The 100ms target applies to the processing chain — everything after the classification window has been collected. The processing chain total (buffer + USB + preprocessing + classification + command + firmware) is approximately 47ms in this scenario — well within 100ms. The 500ms structural window is not a "failure" to meet the 100ms target; it is a separate latency component that drives the overall system delay but is not reducible by processing optimization.

For real-time rehabilitation feedback applications, the relevant user experience is: patient begins imagining movement → exoskeleton begins assisting approximately 547ms later. Whether this feels like meaningful neurofeedback depends on the therapy protocol and the patient's expectations. For applications where faster feedback is required, shortening the classification window (with associated accuracy trade-offs) and using trigger-based actuator interfaces (minimizing exoskeleton firmware latency) are the highest-impact interventions.