Alibaba Cloud Media Alert
(MENAFN- Mid-East Info) Alibaba has launched Qwen3-Next, a brand-new model architecture optimized for long-context understanding, large parameter scale, and unprecedented computational efficiency. Through a suite of architectural innovations, including hybrid attention mechanism and a highly sparse Mixture of Expert (MoE) architecture, Qwen3-Next delivers remarkable performance while minimizing computational cost.
The inaugural model with this novel architecture, Qwen3-Next-80B-A3B-Base, is an 80-billion-parameter model that activates only 3 billion parameters during inference. Both Instruct (non-thinking) and Thinking modes are now open sourced and available on Hugging Face, Kaggle and Alibaba Cloud's ModelScope community. Notably, Qwen3-Next-80B-A3B-Base surpasses the dense Qwen3-32B model, while using less than 10% of its training cost (measured in GPU hours). During inference, it delivers more than 10x higher throughput than Qwen3-32B when handling context lengths exceeding 32K tokens, achieving supreme efficiency in both training and inference. The Qwen3-Next-80B-A3B-Instruct model matches the performance of Alibaba's flagship model Qwen3-235B-A22B-Instruct-2507, while excelling in ultra-long-context scenarios – It natively supports a context window of 256K tokens, extendable up to 1 million tokens. The Qwen3-Next-80B-A3B-Thinking excels at complex reasoning tasks, outperforming even a leading close-sourced thinking model across multiple benchmarks, and approaching the performance of the flagship thinking model Qwen3-235B-A22B-Thinking-2507. The strong performance with ultra efficiency is made possible through architectural innovations, including hybrid attention, which replaces standard attention with the combination of Gated DeltaNet and Gated Attention, enhancing in-context learning capability while improving computational efficiency; Ultra-Sparce MoE design, which activates only 3.7% of parameters (3B out of 80B) per inference step, greatly reducing compute cost without sacrificing model quality; Multi-Token Prediction (MTP), which boosts both model performance and inference efficiency. It also developed training-stability-friendly optimizations, which make large-scale model run more smoothly. As scaling context length and scaling total parameters emerge as major trends in large model development, the innovation in Qwen3-Next marks a significant advancement in model architecture, including linear attention and attention gate, as well as increased sparsity in its MoE design. Trained on a 15 trillion-token subset of Qwen3's 36 trillion-token pre-training corpus, Qwen3-Next is optimized for efficient deployment and operation on consumer-grade hardware. Qwen3-ASR-Flash: Competitive AI speech transcription tool Earlier this week, Alibaba launched Qwen3-ASR-Flash, a cutting-edge automatic speech recognition (ASR) model powered by the multimodal intelligence of Qwen3-Omni and trained on tens of millions of hours of high-quality, multilingual speech data. The model is now accessible to developers through APIs on Alibaba Cloud's generative AI platform Model Studio. It can also be experienced on Hugging Face and Alibaba's ModelScope community. Qwen3-ASR-Flash delivers remarkable accuracy and robustness across 11 major languages including English, Chinese, French, German, Italian, Spanish, Portuguese, Japanese, Korean, Arabic, and Russian. It also supports multiple Chinese dialects, including Sichuanese, Minnan (Hokkien), Wu, and Cantonese, as well as a wide range of English regional accents, ensuring broader regional adaptability. Qwen3-ASR-Flash Excels in Major Industry Benchmarks It surpasses leading ASR models on major industry benchmarks, making it a competitive AI speech transcription tool. Notably, it can accurately transcribe song lyrics even in the presence of strong background music, a challenging task for most speech models. In noisy or complex acoustic environments, Qwen3-ASR-Flash excels at isolating human speech while intelligently filtering out non-speech elements such as silence and background noise. To enable context-aware transcription, users can provide custom prompts in various formats, such as keyword lists, paragraphs, full documents, or even unstructured or nonsensical text, allowing the model to tailor its output for specific domains or use cases more accurately. Thanks to its multilingual precision and resilience in challenging acoustic conditions, Qwen3-ASR-Flash is ideal for various applications, from transcribing online lectures and live broadcasts to analyzing complex audio archives for research, media, or enterprise use. Preview of Qwen3-Max: Alibaba's Largest Non-Thinking Model Last week, Alibaba also previewed Qwen3-Max, the largest“non-thinking” model in the Qwen3 series, boasting over 1 trillion parameters. Ranked as No.6 in Text Arena, a well-recognized ranking on LLMs' versatility, linguistic precision, and cultural context across text, Qwen3-Max-Preview follows complex instructions in both Chinese and English with greater reliability. Compared to previous Qwen2.5 series, Qwen3-Max-Preview significantly reduces hallucinations, and generates higher-quality responses for open-ended Q&A, writing, and conversations. It also delivers high accuracy in mathematics, coding, logic, and scientific reasoning. The model supports over 100 languages, with enhanced capabilities in translation and commonsense reasoning. It is also optimized for advanced workflows including Retrieval-Augmented Generation (RAG) and tool calling, making it ideal for different AI workload. About Alibaba Cloud : Established in 2009, Alibaba Cloud () is the digital technology and intelligence backbone of Alibaba Group. It offers a complete suite of cloud services to customers worldwide, including elastic computing, database, storage, network virtualization services, large-scale computing, security, big data analytics, machine learning and artificial intelligence (AI) services. Alibaba has been named the leading IaaS provider in Asia Pacific by revenue in U.S. dollars since 2018, according to Gartner. It has also maintained its position as one of the world's leading public cloud IaaS service providers since 2018, according to IDC.
The inaugural model with this novel architecture, Qwen3-Next-80B-A3B-Base, is an 80-billion-parameter model that activates only 3 billion parameters during inference. Both Instruct (non-thinking) and Thinking modes are now open sourced and available on Hugging Face, Kaggle and Alibaba Cloud's ModelScope community. Notably, Qwen3-Next-80B-A3B-Base surpasses the dense Qwen3-32B model, while using less than 10% of its training cost (measured in GPU hours). During inference, it delivers more than 10x higher throughput than Qwen3-32B when handling context lengths exceeding 32K tokens, achieving supreme efficiency in both training and inference. The Qwen3-Next-80B-A3B-Instruct model matches the performance of Alibaba's flagship model Qwen3-235B-A22B-Instruct-2507, while excelling in ultra-long-context scenarios – It natively supports a context window of 256K tokens, extendable up to 1 million tokens. The Qwen3-Next-80B-A3B-Thinking excels at complex reasoning tasks, outperforming even a leading close-sourced thinking model across multiple benchmarks, and approaching the performance of the flagship thinking model Qwen3-235B-A22B-Thinking-2507. The strong performance with ultra efficiency is made possible through architectural innovations, including hybrid attention, which replaces standard attention with the combination of Gated DeltaNet and Gated Attention, enhancing in-context learning capability while improving computational efficiency; Ultra-Sparce MoE design, which activates only 3.7% of parameters (3B out of 80B) per inference step, greatly reducing compute cost without sacrificing model quality; Multi-Token Prediction (MTP), which boosts both model performance and inference efficiency. It also developed training-stability-friendly optimizations, which make large-scale model run more smoothly. As scaling context length and scaling total parameters emerge as major trends in large model development, the innovation in Qwen3-Next marks a significant advancement in model architecture, including linear attention and attention gate, as well as increased sparsity in its MoE design. Trained on a 15 trillion-token subset of Qwen3's 36 trillion-token pre-training corpus, Qwen3-Next is optimized for efficient deployment and operation on consumer-grade hardware. Qwen3-ASR-Flash: Competitive AI speech transcription tool Earlier this week, Alibaba launched Qwen3-ASR-Flash, a cutting-edge automatic speech recognition (ASR) model powered by the multimodal intelligence of Qwen3-Omni and trained on tens of millions of hours of high-quality, multilingual speech data. The model is now accessible to developers through APIs on Alibaba Cloud's generative AI platform Model Studio. It can also be experienced on Hugging Face and Alibaba's ModelScope community. Qwen3-ASR-Flash delivers remarkable accuracy and robustness across 11 major languages including English, Chinese, French, German, Italian, Spanish, Portuguese, Japanese, Korean, Arabic, and Russian. It also supports multiple Chinese dialects, including Sichuanese, Minnan (Hokkien), Wu, and Cantonese, as well as a wide range of English regional accents, ensuring broader regional adaptability. Qwen3-ASR-Flash Excels in Major Industry Benchmarks It surpasses leading ASR models on major industry benchmarks, making it a competitive AI speech transcription tool. Notably, it can accurately transcribe song lyrics even in the presence of strong background music, a challenging task for most speech models. In noisy or complex acoustic environments, Qwen3-ASR-Flash excels at isolating human speech while intelligently filtering out non-speech elements such as silence and background noise. To enable context-aware transcription, users can provide custom prompts in various formats, such as keyword lists, paragraphs, full documents, or even unstructured or nonsensical text, allowing the model to tailor its output for specific domains or use cases more accurately. Thanks to its multilingual precision and resilience in challenging acoustic conditions, Qwen3-ASR-Flash is ideal for various applications, from transcribing online lectures and live broadcasts to analyzing complex audio archives for research, media, or enterprise use. Preview of Qwen3-Max: Alibaba's Largest Non-Thinking Model Last week, Alibaba also previewed Qwen3-Max, the largest“non-thinking” model in the Qwen3 series, boasting over 1 trillion parameters. Ranked as No.6 in Text Arena, a well-recognized ranking on LLMs' versatility, linguistic precision, and cultural context across text, Qwen3-Max-Preview follows complex instructions in both Chinese and English with greater reliability. Compared to previous Qwen2.5 series, Qwen3-Max-Preview significantly reduces hallucinations, and generates higher-quality responses for open-ended Q&A, writing, and conversations. It also delivers high accuracy in mathematics, coding, logic, and scientific reasoning. The model supports over 100 languages, with enhanced capabilities in translation and commonsense reasoning. It is also optimized for advanced workflows including Retrieval-Augmented Generation (RAG) and tool calling, making it ideal for different AI workload. About Alibaba Cloud : Established in 2009, Alibaba Cloud () is the digital technology and intelligence backbone of Alibaba Group. It offers a complete suite of cloud services to customers worldwide, including elastic computing, database, storage, network virtualization services, large-scale computing, security, big data analytics, machine learning and artificial intelligence (AI) services. Alibaba has been named the leading IaaS provider in Asia Pacific by revenue in U.S. dollars since 2018, according to Gartner. It has also maintained its position as one of the world's leading public cloud IaaS service providers since 2018, according to IDC.

Legal Disclaimer:
MENAFN provides the
information “as is” without warranty of any kind. We do not accept
any responsibility or liability for the accuracy, content, images,
videos, licenses, completeness, legality, or reliability of the information
contained in this article. If you have any complaints or copyright
issues related to this article, kindly contact the provider above.
Most popular stories
Market Research

- BTCC Summer Festival 2025 Unites Japan's Web3 Community
- Ethereum Based Meme Coin Pepeto Presale Past $6.6 Million As Exchange Demo Launches
- Ecosync & Carboncore Launch Full Stages Refi Infrastructure Linking Carbon Credits With Web3
- Invromining Expands Multi-Asset Mining Platform, Launches New AI-Driven Infrastructure
- BTCC Announces Participation In Token2049 Singapore 2025, Showcasing NBA Collaboration With Jaren Jackson Jr.
- Innovation-Driven The5ers Selects Ctrader As Premier Platform For Advanced Traders
Comments
No comment