The current model is ~58 MB — by necessity, not by choice
The on-device driving model (driving_vision + driving_policy) totals around 58 MB on disk. That is not the natural size for a task this complex — it is the size comma was forced to hit to run at 20 Hz on the Snapdragon 845 MAX DSP. The model was engineered down to fit the chip, not sized up to match the task.
The real bottleneck is compute throughput, not just RAM
Driving requires inference at 20 Hz — one full forward pass every 50 milliseconds. The Snapdragon DSP has a fixed number of TOPS (tera-operations per second). A model 10–100x larger simply would not finish in time, dropping to 2–3 Hz, which is useless for real-time control. More RAM alone would not solve that.
Limited field of view
Even though comma 3, 3X, and 4 all have a 360-degree camera system, the on-device model can only process a narrow slice of that view at inference time. Processing more cameras at higher resolution multiplies the compute requirements dramatically — well beyond what the current chip can handle.
No training architecture (until now)
Previously, comma had no internal infrastructure to build new, larger models from scratch. They could refine and retrain existing architectures, but there was no pipeline to scale up dramatically.
The comma devices run on a Snapdragon 845 MAX — a powerful mobile chip with a custom cooling system that comma engineered to prevent thermal throttling on a hot windshield. It is very good at what it does. But a model that processes wider camera inputs and runs a fundamentally larger architecture would need orders of magnitude more compute throughput than the DSP can sustain at real-time driving speeds. Interestingly, comma has already added placeholder files named big_driving_policy.onnx and big_driving_vision.onnx to the openpilot repository — empty stubs that signal the bigger model is already being planned in the codebase.
Piece 1: tinygrad's training infrastructure
tinygrad — the open-source deep learning framework George Hotz runs alongside comma.ai — recently completed the infrastructure needed to train large models end-to-end. That means comma can now build bigger, better-architected models with significantly more data, rather than being constrained by earlier tooling.
Piece 2: GPU support over USB4
Also recently: tinygrad enabled external GPU support for comma devices over USB4. The comma 3X and comma 4 both have a USB4 port. That port can now be used to connect an eGPU — giving openpilot access to dramatically more compute than the on-device chip can provide.
Together, these two developments form the complete stack: a way to build a bigger model, and a way to run it. Neither alone was enough. With both in place, comma can start targeting something their hardware was never able to do before.
Wider field of view
The camera hardware already sees 360 degrees. A larger model running on a GPU could actually process and use more of that panoramic view — seeing more of the road, adjacent lanes, and hazards that the current compact model simply cannot fit in its context.
More reliable driving
More parameters trained on more data generally means more robust behavior across edge cases. A bigger model can better handle unusual road conditions, merges, intersections, and scenarios that are currently underrepresented in the smaller model.
Higher-rate steering commands
The lead comma engineer mentioned this specifically: a bigger model can poll and output steering commands at a much higher rate. That translates directly into smoother, more precise turns — one of the most noticeable improvements drivers would feel behind the wheel.
USB4 is for loading, not running
A common misconception is that running inference over USB4 would be too slow. That is not how it works. USB4 is only used to load the model weights onto the GPU at startup. Once the model is loaded, all inference happens entirely on the GPU's own internal memory bandwidth — which is massively faster than any external connection.
Why mobile chips cannot compete
Mobile SoCs like the Snapdragon 845 have limited memory bandwidth, tight thermal envelopes, and are fundamentally designed for power efficiency, not throughput. A discrete GPU with its own dedicated VRAM has 10x–20x more memory bandwidth and can sustain workloads that would immediately throttle or overflow a phone chip.
The 9070 XT and Tesla HW5
The AMD RX 9070 XT has been benchmarked comparably to Tesla's HW5 chip on certain inference workloads — a remarkable comparison given that HW5 is a purpose-built autonomous driving accelerator. According to comma's lead engineer on an X (Twitter) voice chat, this is a GPU they are excited about for running a larger openpilot model.
This is not vaporware. The USB4 GPU support in tinygrad is real and recently shipped. The training infrastructure is real and being used — comma's openpilot 0.11 world model is the first product of that investment. And the hardware port has been on every comma device since the 3X.
What does not exist yet is the bigger inference model itself, and the consumer eGPU product to run it on. Those are still in development. But all the pieces are present for the first time, and the excitement from the team — including the direct comments from the lead engineer — is not theoretical.
For comma users, this matters most for one reason: your device's cameras already see the full road. Soon, the model might actually be able to use all of it.
The release that debuted the new tinygrad-powered model training.
George Hotz's open-source ML framework powering the training and GPU inference stack.
The comma 3X and comma 4 — both have USB4 and 360-degree cameras.