Alibaba Introduces Open-Source Model For Digital Human Video Generation
(MENAFN- Mid-East Info) Speech-to-Video model, Wan2.2-S2V, brings portraits to life
Alibaba has unveiled Wan2.2-S2V Speech-to-Video, its latest open-source model designed for digital human video creation. This innovative tool converts portrait photos into film-quality avatars capable of speaking, singing, and performing. Part of Alibaba's Wan2.2 video generation series, the new model can generate high-quality animated videos from a single image and an audio clip. Wan2.2-S2V offers versatile character animation capabilities, enabling the creation of videos across multiple framing options including portrait, bust, and full-body perspectives. It can generate character actions and environmental factors dynamically based on prompt instructions, allowing professional content creators to capture precise visual representations tailored to specific storytelling and design requirements. Powered by advanced audio-driven animation technology, the model delivers lifelike character performances, ranging from natural dialogue to musical performances, and seamlessly handles multiple characters within a scene. Creators can now transform voice recordings into lifelike animated movements, supporting a diverse range of avatars, from cartoon and animals to stylized characters. To meet the diverse needs of professional content creators, the technology provides flexible output resolutions of 480P and 720P. This ensures high-quality visuals output that meets various professional and creative standards, making it suitable for both social media content and professional presentations. Innovative Technologies: Wan2.2-S2V transcends traditional talking-head animations by combining text-guided global motion control with audio-driven fine-grained local movements. This enables natural and expressive character performances across complex and challenging scenarios. Another key breakthrough lies in the model's innovative frame processing technique. By compressing historical frames of arbitrary length into a single, compact latent representation, the technology significantly reduces computational overhead. This approach allows for remarkably stable long-video generation, addressing a critical challenge in extended animated content production. The model's advanced capabilities are further amplified by the model's comprehensive training methodology. Alibaba's research team constructed a large-scale audio-visual dataset specifically tailored to film and television production scenarios. Using a multi-resolution training approach, Wan2.2-S2V supports flexible video generation across diverse formats – from vertical short-form content to traditional horizontal film and television productions. Wan2.2-S2V model is available to download on Hugging Face and GitHub, as well as Alibaba Cloud's open-source community, ModelScope. A major contributor to the global open-source community, Alibaba open sourced Wan2.1 models in February 2025 and Wan 2.2 models in July. To date, the Wan series has generated over 6.9 million downloads on Hugging Face and ModelScope. About Alibaba Group: Alibaba Group is a global technology company focused on e-commerce and cloud computing. We enable merchants, brands and retailers to market, sell and engage with consumers by providing digital and logistics infrastructure, efficiency tools and vast marketing reach. We empower enterprises with our leading cloud infrastructure, services and work collaboration capabilities to facilitate their digital transformation and grow their businesses.
Alibaba has unveiled Wan2.2-S2V Speech-to-Video, its latest open-source model designed for digital human video creation. This innovative tool converts portrait photos into film-quality avatars capable of speaking, singing, and performing. Part of Alibaba's Wan2.2 video generation series, the new model can generate high-quality animated videos from a single image and an audio clip. Wan2.2-S2V offers versatile character animation capabilities, enabling the creation of videos across multiple framing options including portrait, bust, and full-body perspectives. It can generate character actions and environmental factors dynamically based on prompt instructions, allowing professional content creators to capture precise visual representations tailored to specific storytelling and design requirements. Powered by advanced audio-driven animation technology, the model delivers lifelike character performances, ranging from natural dialogue to musical performances, and seamlessly handles multiple characters within a scene. Creators can now transform voice recordings into lifelike animated movements, supporting a diverse range of avatars, from cartoon and animals to stylized characters. To meet the diverse needs of professional content creators, the technology provides flexible output resolutions of 480P and 720P. This ensures high-quality visuals output that meets various professional and creative standards, making it suitable for both social media content and professional presentations. Innovative Technologies: Wan2.2-S2V transcends traditional talking-head animations by combining text-guided global motion control with audio-driven fine-grained local movements. This enables natural and expressive character performances across complex and challenging scenarios. Another key breakthrough lies in the model's innovative frame processing technique. By compressing historical frames of arbitrary length into a single, compact latent representation, the technology significantly reduces computational overhead. This approach allows for remarkably stable long-video generation, addressing a critical challenge in extended animated content production. The model's advanced capabilities are further amplified by the model's comprehensive training methodology. Alibaba's research team constructed a large-scale audio-visual dataset specifically tailored to film and television production scenarios. Using a multi-resolution training approach, Wan2.2-S2V supports flexible video generation across diverse formats – from vertical short-form content to traditional horizontal film and television productions. Wan2.2-S2V model is available to download on Hugging Face and GitHub, as well as Alibaba Cloud's open-source community, ModelScope. A major contributor to the global open-source community, Alibaba open sourced Wan2.1 models in February 2025 and Wan 2.2 models in July. To date, the Wan series has generated over 6.9 million downloads on Hugging Face and ModelScope. About Alibaba Group: Alibaba Group is a global technology company focused on e-commerce and cloud computing. We enable merchants, brands and retailers to market, sell and engage with consumers by providing digital and logistics infrastructure, efficiency tools and vast marketing reach. We empower enterprises with our leading cloud infrastructure, services and work collaboration capabilities to facilitate their digital transformation and grow their businesses.

Legal Disclaimer:
MENAFN provides the
information “as is” without warranty of any kind. We do not accept
any responsibility or liability for the accuracy, content, images,
videos, licenses, completeness, legality, or reliability of the information
contained in this article. If you have any complaints or copyright
issues related to this article, kindly contact the provider above.
Most popular stories
Market Research

- Ethereum-Based Meme Project Pepeto ($PEPETO) Surges Past $6.5M In Presale
- 1Inch Unlocks Access To Tokenized Rwas Via Swap API
- Meme Coin Little Pepe Raises Above $24M In Presale With Over 39,000 Holders
- Chipper Cash Powers 50% Of Bitcoin Transactions With Bitcoin Lightning Network Via Voltage
- Kucoin Partners With Golf Icon Adam Scott As Global Brand Ambassador
- ROVR Releases Open Dataset To Power The Future Of Spatial AI, Robotics, And Autonomous Systems
Comments
No comment