Flash News

ElevenLabs Launches Dubbing V2 End-to-End AI Dubbing

ElevenLabs officially released Dubbing V2, an end-to-end AI dubbing model.

The biggest breakthrough is the abandonment of the traditional "transcription-translation-synthesis" three-step process, directly modeling the voice based on the original performance, fully preserving and transferring tone, emotion, and interpretation to other languages.

In terms of market dynamics, content creators and film production companies are accelerating the adoption of ElevenLabs Dubbing V2 for multilingual dubbing; event-driven funding is shifting from traditional dubbing studios to fully automated AI tools; ElevenLabs and AI media technology companies benefit, while traditional dubbing service providers relying on manual post-production and segmented processing face pressure.

Source: Public Information

ABAB AI Insight

ElevenLabs has previously expanded rapidly in the podcast and short video fields through voice cloning technology. The earlier Dubbing V1 still relied on segmented processing, while this V2 version achieves end-to-end modeling, marking a technological leap from voice replication to performance-level emotional transfer. It has already supported multilingual voice cloning and served numerous YouTube creators.

In terms of capital pathways, ElevenLabs is shifting its core R&D resources from single-language cloning to cross-language performance modeling. By using automated voice models and emotional fidelity technology, it significantly reduces the budget that previously required multiple voice actors and post-production teams to a single AI generation, while supporting audio/video/text inputs to expand application scenarios.

Similar to AI video tools like Runway and HeyGen, which are iterating from single functions to full-process automation, and the penetration of various AI dubbing products in Hollywood and cross-border content in 2024-2025; current AI media creation is in an expansion phase transitioning from text generation to multimodal performance fidelity.

Essentially, this represents a technological replacement, where end-to-end performance modeling replaces traditional multi-step manual dubbing with single-model cross-language generation. The mechanism lies in directly extracting tone, emotion, and rhythm features from the original performance, avoiding intermediate transcription errors, while allowing capital to shift from high labor cost localization production to AI-driven global content distribution.

ABAB News · Cognitive Law

Truly powerful translation is not just about changing words, but allowing the same person to continue performing in different languages. When AI can fully traverse emotions and breathing rhythms, language boundaries have essentially disappeared. The ultimate form of content globalization is not multilingual dubbing, but the same soul speaking to the world.

Source

·ABAB News

05/29/2026, 04:34 AM·

2 min read

·4d ago