OpenAI has partnered with Cerebras to add 750MW of ultra low-latency AI compute to its platform. The partnership has aimed to strengthen OpenAI’s inference infrastructure by integrating Cerebras’ purpose-built AI systems designed for faster response times.
Cerebras has developed AI hardware that combines compute, memory and bandwidth on a single large chip, reducing bottlenecks that typically slow inference on conventional systems. This integration has been positioned to support long outputs and real-time responses across a range of AI workloads.
OpenAI has stated that the additional low-latency capacity has been intended to improve how its models respond to complex tasks such as answering detailed questions, generating code, creating images and running AI agents. Faster response times have been linked to higher user engagement and the ability to support more demanding, real-time workloads.
The low-latency compute capacity has been set to integrate into OpenAI’s inference stack in phases, with expansion planned across workloads over time. The rollout has been scheduled to take place in multiple tranches, extending through 2028.
“OpenAI’s compute strategy is to build a resilient portfolio that matches the right systems to the right workloads. Cerebras adds a dedicated low-latency inference solution to our platform. That means faster responses, more natural interactions, and a stronger foundation to scale real-time AI to many more people,” said Sachin Katti of OpenAI.
“We are delighted to partner with OpenAI, bringing the world’s leading AI models to the world’s fastest AI processor. Just as broadband transformed the internet, real-time inference will transform AI, enabling entirely new ways to build and interact with AI models,” said Andrew Feldman, co-founder and CEO of Cerebras.














