ElevenLabs cofounder Mati Staniszewski explains how voice AI models work, why conversational agents still haven't passed the Turing test, and how the company scaled to $450 million ARR.
●
Voice models predict phonemes using transformer and diffusion architectures, then decode context and speaker characteristics to generate emotionally expressive speech without hard-coded parameters.
●
Conversational voice agents remain unsolved: they require orchestrating speech-to-text, LLM reasoning, and text-to-speech whilst handling turn-taking, tool calls, and authentication gracefully.
●
ElevenLabs grew 100 million in net new ARR this quarter by offering self-serve access, pay-as-you-go pricing, and deploying small technical teams across enterprise customer organisations.
"High agency people are the winners of the advances in AI and within organisations, low agency people will lose out." - John Collison