OpenAI Partners with Cerebras to Optimize Codex: WebSocket Support Achieves Ultra-Low Latency
OpenAI engineer Sherwin Wu posted that Cerebras has maximized the acceleration of GPT-5.3-Codex, prompting the team to redesign Codex's use of the Responses API, ultimately adding WebSocket support for ultra-low latency.
Market Mechanism: Cerebras hardware acceleration drives OpenAI to optimize API infrastructure, reducing developer tool latency, directing funds towards high-performance AI inference chips and real-time applications, while traditional cloud inference services face pressure.
Source: Public Information
ABAB AI Insight
Sherwin Wu's OpenAI team previously relied on the Responses API, and this integration of WebSocket was driven by the breakthrough in Cerebras hardware speed, continuing OpenAI's evolution from REST to real-time low-latency interaction.
On the capital path, the collaboration between Cerebras and OpenAI accelerates inference, with OpenAI reducing Codex latency through WebSocket, expanding real-time coding and agent application scenarios, while also bringing more deployment demand for the Cerebras hardware ecosystem.
Similar to collaborations with Groq or other inference optimization hardware, this is currently in the expansion phase of AI model inference transitioning from batch to real-time interaction.
Structural Judgment: This essentially represents a technological substitution, as the breakthrough in Cerebras hardware speed forces OpenAI to shift from traditional APIs to WebSocket, utilizing hardware acceleration to achieve sub-second responses, replacing higher-latency REST calls, thereby enhancing Codex's competitiveness in real-time development scenarios.