The world of voice AI, with Mati Staniszewski of ElevenLabs
|
|
ElevenLabs cofounder Mati Staniszewski explains how voice AI models work, why conversational agents still haven't passed the Turing test, and how the company scaled to $450 million ARR.
|
|
●
|
Voice models predict phonemes using transformer and diffusion architectures, then decode context and speaker characteristics to generate emotionally expressive speech without hard-coded parameters.
|
|
●
|
Conversational voice agents remain unsolved: they require orchestrating speech-to-text, LLM reasoning, and text-to-speech whilst handling turn-taking, tool calls, and authentication gracefully.
|
|
●
|
ElevenLabs grew 100 million in net new ARR this quarter by offering self-serve access, pay-as-you-go pricing, and deploying small technical teams across enterprise customer organisations.
|
|
|
"High agency people are the winners of the advances in AI and within organisations, low agency people will lose out." - John Collison
|
|