In-Depth

ElevenLabs: The Rise of a Voice AI Empire and the Two Polish Founders Reshaping Global Audio

·
19 min read

ElevenLabs was founded in 2022 by the Polish entrepreneurs Mati Staniszewski and Piotr Dąbkowski. In its earliest public framing, it was not pitched as a generic “AI platform,” but as a voice-research company focused on long-form narration quality, cross-language dubbing, and content accessibility. From the start, the company described its long-term ambition as making spoken content accessible in any language and any voice.

The immediate spark came from a very concrete cultural frustration: poor Polish dubbing practices. In Sequoia’s 2025 interview, Mati recalled that Piotr was about to watch a movie with his girlfriend, who did not speak English, and the two of them were reminded of the low-quality single-narrator dubbing they had grown up with in Poland. This was not an abstract AI opportunity; it was a childhood pain point that they believed technology could finally fix. During their years at Google and Palantir, they had already been building weekend hack projects together, so ElevenLabs emerged from a long collaborative pattern rather than a one-off startup idea.

The company scaled at extraordinary speed. In January 2023, at public beta launch, ElevenLabs announced a $2 million pre-seed round led by Credo Ventures and Concept Ventures. By January 2024, it had raised an $80 million Series B and launched Dubbing Studio, Voice Library, and an early Reader app; the company said its technology was being used by employees at 41% of the Fortune 500. By January 2025, ElevenLabs raised a $180 million Series C at a $3.3 billion valuation and said its tools were being adopted by employees at over 60% of the Fortune 500. In February 2026, it raised a $500 million Series D at an $11 billion valuation; by May 2026, the company disclosed that it had ended 2025 at $350 million ARR and had already surpassed $500 million ARR in the first four months of 2026.

Its product evolution is equally important. It began with highly realistic text-to-speech, then expanded into voice cloning, dubbing, long-form editing workflows, and developer APIs, and later into speech-to-text, music, image/video tools, real-time conversational agents, and enterprise voice workflows. By 2026, the company’s public product architecture had been organized into three pillars: ElevenCreative for creation, ElevenAgents for enterprise and customer operations, and ElevenAPI for developers. Its research timeline shows a path from Eleven Multilingual v2, Turbo, and Flash to Scribe, Eleven v3, Eleven Music, Scribe v2 Realtime, Scribe v2, and Expressive Mode for Agents. In other words, ElevenLabs is no longer just a TTS startup; it is trying to own the broader AI-audio infrastructure layer.

If its history is reduced to one strategic sentence, it is this: ElevenLabs did not start as a “fun voice app” and only later figure out monetization. It worked from the beginning on model quality, creator workflows, developer interfaces, enterprise deployment, and safety governance at the same time. Sequoia’s interview with Mati makes the main strategic point explicit: while major foundation-model labs were broadening into multimodality, ElevenLabs stayed intensely focused on audio, and that deliberate narrowness helped it avoid becoming roadkill.

English Founders
Public information about the founders’ family backgrounds is extremely limited. What can be stated with confidence is that both men grew up in Poland, were high-school friends, and later moved to the UK for higher education. But reliable English-language public sources do not really disclose their parents’ professions, family wealth, class position, or detailed childhood environment. On that part, the most accurate conclusion is: public information is limited / cannot be confirmed for now. The highest-confidence facts are that Mati says in his official ElevenLabs bio that he grew up in Poland and moved to the UK to study mathematics at Imperial College London, while Endeavor describes both founders as high-school friends who grew up together in Poland. Sifted also wrote in 2024 that Mati was “born and raised in Warsaw,” but Piotr’s exact birthplace and broader family details remain insufficiently documented in major English-language sources.

Their educational paths are clearer than their family backgrounds. Mati’s route is straightforward: his official ElevenLabs author page says he studied mathematics at Imperial College London. Piotr’s path is reconstructed from two reliable strands. ElevenLabs’ official author page confirms that he studied for an MPhil at the University of Cambridge and published AI-based image-detection research at NeurIPS during that period. Endeavor also states that he went to Oxford. So it is reasonable to say that Piotr has formal academic ties to both Oxford and Cambridge, though the exact order of degrees, specific program names, and full academic chronology are still not completely spelled out in public-facing company materials.

Before entering voice AI, their professional roles were highly complementary. Mati’s official biography emphasizes Palantir, where he helped enterprises and governments deploy new technology. In an earlier founder interview, he also described a pre-Palantir path through Opera Software and BlackRock, suggesting that his early formation was not purely academic but strongly oriented around applied analytics, products, deployment, and customer problem-solving. Piotr’s public pre-ElevenLabs identity is more technical: official materials consistently describe him as an ex-Google machine-learning engineer. That made Google his most representative role before ElevenLabs.

This complementarity became the operating logic of ElevenLabs itself. Piotr is the engine for research and model breakthroughs, focused on context understanding, emotional control, low latency, and multilingual robustness. Mati is the engine for deployment and commercialization, focused on pushing those models into real workflows for developers, creators, enterprises, and government users. The official author pages state this split directly: Piotr leads research and engineering, while Mati leads teams building AI that can communicate at a human level. In the Sequoia interview, Mati explicitly credits Piotr’s research leadership and his ability to assemble a world-class audio team as one of the reasons ElevenLabs has been able to compete with much larger foundation-model companies.

Why did they take this path? At least three forces line up. First, a cultural and language experience: poor dubbing was a recurring childhood frustration. Second, a technological opening: in the Sequoia interview, Mati argues that transformer and diffusion advances had not yet been efficiently applied to audio, and that audio had received much less research attention than text and image generation. Third, their work experience shaped them in complementary ways: Google gave Piotr model and ML depth, while Palantir gave Mati strong instincts for translating customer problems into deployable products. Their long-running weekend hack projects made the move into entrepreneurship feel like a convergence rather than a leap.

In terms of public identity, Mati has become the more externally visible operator-founder. Sifted described him in 2024 as a 29-year-old cofounder who had, in less than two years, built ElevenLabs into a global AI-audio sensation. In 2025, he was also appointed to Klarna’s board. Piotr has remained more closely associated with the technical-founder archetype; his inclusion in TIME100 AI in 2024 centered on the technical power of ElevenLabs’ lower-latency, higher-quality voice generation and dubbing systems.

English Capital, Business Model, and Turning Points
ElevenLabs has built an unusually strong capital network, and not in a single straight line. Its funding progression moves from early European venture firms to top U.S. generative-AI backers, then to strategic corporate investors, and finally to major global financial institutions and celebrity investors. The pre-seed came from Credo and Concept. Series B involved a16z, Nat Friedman, Daniel Gross, Sequoia, Smash, SV Angel, BroadLight, and Credo. Series C brought in ICONIQ, NEA, WiL, Valor, Endeavor Catalyst, and Lunate, while also tying in strategic backers such as Deutsche Telekom, LG Technology Ventures, HubSpot Ventures, NTT DOCOMO Ventures, and RingCentral Ventures. Series D was led by Sequoia, with a16z and ICONIQ increasing their stakes and Lightspeed, Evantic, and BOND joining. By May 2026, the company had also added BlackRock, Wellington, D.E. Shaw, Schroders, NVIDIA via NVentures, Santander, Jamie Foxx, and Eva Longoria. That investor stack shows that ElevenLabs is no longer just a VC-backed startup. It is now seen as a strategic infrastructure company across finance, enterprise software, telecom, and creative industries.

Its business model is multi-layered, not single-stream. First, there is self-serve subscription and usage-based pricing. The official pricing pages show TTS and ASR sold by usage, with text-to-speech charged per 1,000 characters and Scribe charged by the hour. Second, the Agents platform combines tiered subscriptions with per-minute calling economics, concurrency limits, knowledge bases, workflow tools, and telephony integrations. Third, there is enterprise-contract revenue from large deployments in customer support, sales, marketing, training, and operational workflows. Fourth, there is platform and ecosystem revenue-sharing through Voice Library and Voice Actor Payouts, where creators can place their Professional Voice Clones into the marketplace and receive payouts through Stripe when other users generate speech with those voices.

The more sophisticated part of the model is how it combines influence assets and directly monetizable assets. ElevenCreative, ElevenAgents, and ElevenAPI are straightforward revenue products. But Voice Library and Iconic Marketplace are both marketplace assets and brand/reputation assets. The former lets ordinary voice owners earn passive income from licensed usage. The latter connects creators with rights holders to license well-known and legacy voices. Official documentation states that Voice Library is a marketplace for Professional Voice Clones and that users can earn rewards when others use their voice models. Iconic Marketplace explicitly describes itself as a licensing bridge between creators and rights holders for iconic IP. The Matthew McConaughey announcement and the Michael Caine/AP coverage show what that means in practice: ElevenLabs is trying to control scarce, rights-cleared voice inventory, not merely offer generic cloning tools.

One of the company’s biggest commercial turning points was its evolution from a creator tool into enterprise communication infrastructure. In the 2024 Series B announcement, the focus was still very much on dubbing workflows, Voice Library, Reader, and creator/publisher use cases. By 2025 and 2026, the narrative had shifted clearly toward ElevenAgents, developer stacks, customer support, conversational commerce, and government services. In his TIME interview, Mati said the customer mix had moved from roughly 90/10 individual-to-enterprise in early 2024 to something closer to 60/40 or 50/50 by late 2025, and he said conversational AI was the faster-moving category. That means ElevenLabs is no longer simply trying to be “the best voiceover tool.” It is trying to capture the budget attached to enterprise communication itself.

Another critical decision was treating safety as part of business durability rather than as a PR afterthought. The official Safety page frames the company’s approach through Safety by Design, Traceability & Accountability, Transparency, Agility, and Collaboration. It says generated content can be traced back to the account that created it, and that serious violators can be banned and referred to law enforcement. The company also offers an AI Speech Classifier, but its own classifier page explicitly says it does not reliably classify audio generated with ElevenV3. That is an important signal: the company has invested heavily in safety tooling, but it also publicly acknowledges that stronger models can outpace perfect detection. ElevenLabs is also part of the U.S. AI Safety Institute Consortium and entered a three-year partnership with the U.K. AI Safety Institute in 2026. In effect, it is turning safety partnerships themselves into part of its institutional moat.

The company’s strongest result is not simply that one product is better than competitors’. It is that ElevenLabs has turned audio AI into a platform spanning creation, development, enterprise operations, and public-sector interfaces. In 2024 it said employees at 41% of Fortune 500 companies were already using its technology; by 2025 that became over 60%. Its customer footprint spans publishing and media, gaming, telecom, fintech, legal, and government. When ARR crossed $500 million in early 2026, that was not just evidence of product popularity. It signaled that voice was becoming part of core operational workflows inside major institutions.

English Controversies, Current Position, and Limitations
ElevenLabs has carried controversy almost from the beginning, and the core question behind most of it is simple: when voices become easily replicable, how much responsibility does the platform bear? In January 2023, The Verge reported that 4chan users were already using ElevenLabs’ free voice-cloning capabilities to produce celebrity and public-figure imitations, including hate speech and abusive content. In early 2024, the New Hampshire Biden robocall scandal created a much larger public flashpoint. AP covered the case as a major election-related investigation, while Wired reported that researchers believed the fake Biden audio was likely made using ElevenLabs tools. In other words, one of the earliest real-world demonstrations of ElevenLabs’ technical quality also became one of the earliest proofs of its public-risk profile.

The company has since tightened restrictions. Its help center now states that Professional Voice Cloning can only be used to create a clone of your own voice, and that even with someone else’s consent, you cannot create a self-serve Professional Voice Clone of another person because the system requires voice verification. At the same time, its broader marketing pages still say users should only clone voices for which they have explicit permission. This implies a two-track system: strict user-side restrictions for ordinary customers, and carefully licensed, rights-cleared arrangements for custom partnerships and celebrity/legacy voices. AP’s 2025 reporting also noted that ElevenLabs had strengthened safeguards after earlier misuse controversies and was blocking unauthorized cloning of celebrity-style voices.

Legally, one of the most important public cases was Vacker v. ElevenLabs in 2024. The complaint shows that two voice actors, two authors, and a publisher sued ElevenLabs, alleging misappropriation of voice/publicity rights and DMCA-related violations; the complaint explicitly linked the platform’s default voices “Bella” and “Adam” to the plaintiffs’ voices. These are allegations in a complaint, not judicial findings of fact. According to AI Lawsuit Tracker’s later docket summary, the case was marked settled as of May 2026, but public materials do not disclose the settlement amount or terms. The importance of this case lies less in a public courtroom victory or defeat and more in the way it pushed ElevenLabs into the harder legal terrain around training data, voice identity, copyright-management information, and the legitimacy of platform default voices.

Even if that 2024 case settled, the controversy did not end. In May 2026, Sifted reported that a group of journalists and voice professionals sued ElevenLabs in Illinois, alleging that the company built its voice models using recordings of their voices without consent. The case remains at the allegation stage, with no final resolution yet. But the broader pattern is important: the controversy around ElevenLabs has shifted from “will users misuse the tool?” to “how was the model trained in the first place?” If the 2023–2024 period centered on output-side abuse, the 2024–2026 phase has increasingly centered on input-side consent and compliance.

As of 2026, ElevenLabs is no longer merely a promising European AI startup. It is firmly in the global top tier of AI-audio companies. The company announced an $11 billion valuation in February 2026 and disclosed more than $500 million ARR in May 2026. Its public platform now spans 70+ languages, 10,000+ voices, enterprise agents, creator workflows, APIs, government offerings, and the Impact Program. Geographically, it operates with what is effectively a transatlantic core. Officially, ElevenLabs said in 2024 that London had become its European HQ and center for worldwide operations, while remaining remote-first and spread across more than 15 countries. Reuters often describes it as London-based, while AP described it in 2025 as New York-based. The best interpretation is not that one source is simply “wrong,” but that ElevenLabs has evolved into a cross-border company with major London and New York centers rather than a single-city identity.

Why will ElevenLabs and its founders be remembered? Probably for four reasons. First, they pushed voice AI from mechanical TTS toward context-sensitive, emotionally expressive, multilingual, and increasingly real-time interaction. Second, they turned voice from a creator-side feature into enterprise and public-sector infrastructure. Third, they showed that a Europe-rooted team could build a globally important platform in generative AI rather than just a niche tool. And fourth, they forced the market to grapple with voice rights as a structural issue, not as a side effect of better software. At the same time, the company’s long-term risk is now very clear: not whether it can build better products, but whether it can handle training-data compliance, identity verification, celebrity licensing, political deepfakes, and public trust at the scale of infrastructure. On parents, family class, precise birth details, the full cap table, and settlement terms, public English-language materials remain inadequate, so the correct conclusion is still: public information is limited / accounts differ / cannot be fully confirmed at this time.