Mira Murati's new company Thinking Machines releases voice model with 400ms response time in human conversation range
Thinking Machines (TML), founded by former OpenAI CTO Mira Murati, has released its first voice model, highlighting a 400ms turn-taking latency.
This figure falls within the range of natural human conversation (linguistic studies across 10 languages show an average of about 200ms for humans, 7ms for Japanese, 400ms for Danish, and delays over 600ms are generally perceived as "awkward"). Currently, mainstream competitors are outside this range: GPT-realtime-1.5 at 590ms, Gemini-3.1-flash-live at 570ms, and Qwen 3.5 OMNI at a high of 2140ms.
Source: Public information
ABAB AI Insight
Mira Murati's team has chosen to break through on "perceived humanity" rather than pure intelligence, intentionally conceding a small score in structured reasoning (e.g., IFEval 89.7 vs GPT-realtime-2.0's 95.2) in exchange for significant advantages in streaming response and conversational naturalness. The model excels in streaming metrics such as FD-bench V3 response quality (82.8) and Bigbench Audio (75.7).
This $2 billion seed round (with a $12 billion valuation) essentially bets on the "real human feel of voice interaction," targeting high-frequency scenarios that must "sound human," such as phone customer service, sales SDR, medical consultations, and IVR replacements, rather than merely pursuing academic benchmarks.
Structural judgment: Essentially a technological replacement. TML's 400ms latency directly replaces the "robotic feel" of traditional voice interactions, as humans are extremely sensitive to delays in conversation; even slight millisecond differences can determine whether users are willing to engage long-term, shifting capital and application scenarios from "smarter" to "more human-like" voice agents, initiating the transformation of voice AI from a tool to a true conversational partner.
ABAB News · Cognitive Law
600ms is machine-like, 400ms is human-like.
The real voice competition has never been about who is smarter, but about who sounds more like a human.
Whoever first achieves a delay within the acceptable range for humans will take the entry point for the next generation of phone, customer service, and companionship scenarios.