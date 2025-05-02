MENAFN - Asia Times) Alibaba Group's newly-released large language model Qwen3 has shown higher mathematical-proving and code-writing abilities than its previous models and some American peers, putting it at the top of benchmark charts.

Qwen3 offers two mixture-of-experts (MoE) models (Qwen3-235B-A22B and Qwen3-32B-A3B) and six dense models.

A MoE, also used by OpenAI's ChatGPT and Anthropic's Claude, can assign a specialized“expert” model to answer questions on a specific topic. A dense model can perform a wide range of tasks, such as image classification and natural language processing, by learning complex patterns in data.

Alibaba, a Hangzhou-based company, used 36 trillion tokens to train Qwen3, doubling the number used for training the Qwen2.5 model. DeepSeek, another Hangzhou-based firm, used 14.8 trillion tokens to train its R1 model. The higher the number of tokens used, the more knowledgeable an AI model is.

At the same time, Qwen3 has a lower deployment threshold than DeepSeek V3, meaning users can deploy it at lower operating costs and with reduced energy consumption.

Qwen3-235B-A22B features 235 billion parameters but requires activating only 22 billion. DeepSeek R1 features 671 billion parameters and requires activating 37 billion. Fewer parameters mean lower operation costs.

The US stock market slumped after DeepSeek launched its R1 model on January 20. AI stock investors were shocked by DeepSeek R1's high performance and low training costs.

Media reports said DeepSeek will unveil its R2 model in May. Some AI fans expected DeepSeek R2 to have greater reasoning ability than R1 and the ability to catch up with OpenAI o4-mini.