The world of voice AI, with Mati Staniszewski of ElevenLabs

Cheeky Pint

Listen on: Spotify | Apple Podcasts | Podlink (1 h)

Topics: AI | Product | Startups | Technology

ElevenLabs cofounder Mati Staniszewski explains how voice AI models work, why conversational agents still haven't passed the Turing test, and how the company scaled to $450 million ARR.

Voice models predict phonemes using transformer and diffusion architectures, then decode context and speaker characteristics to generate emotionally expressive speech without hard-coded parameters.

Conversational voice agents remain unsolved: they require orchestrating speech-to-text, LLM reasoning, and text-to-speech whilst handling turn-taking, tool calls, and authentication gracefully.

ElevenLabs grew 100 million in net new ARR this quarter by offering self-serve access, pay-as-you-go pricing, and deploying small technical teams across enterprise customer organisations.

"High agency people are the winners of the advances in AI and within organisations, low agency people will lose out." - John Collison