Tuesday, 02 January 2024 12:17 GMT

Alibaba's AI Model Qwen3: A Smart Kid Prone To Hallucinations


(MENAFN- Asia Times) Alibaba Group's newly-released large language model Qwen3 has shown higher mathematical-proving and code-writing abilities than its previous models and some American peers, putting it at the top of benchmark charts.

Qwen3 offers two mixture-of-experts (MoE) models (Qwen3-235B-A22B and Qwen3-32B-A3B) and six dense models.

A MoE, also used by OpenAI's ChatGPT and Anthropic's Claude, can assign a specialized“expert” model to answer questions on a specific topic. A dense model can perform a wide range of tasks, such as image classification and natural language processing, by learning complex patterns in data.

Alibaba, a Hangzhou-based company, used 36 trillion tokens to train Qwen3, doubling the number used for training the Qwen2.5 model. DeepSeek, another Hangzhou-based firm, used 14.8 trillion tokens to train its R1 model. The higher the number of tokens used, the more knowledgeable an AI model is.

At the same time, Qwen3 has a lower deployment threshold than DeepSeek V3, meaning users can deploy it at lower operating costs and with reduced energy consumption.

Qwen3-235B-A22B features 235 billion parameters but requires activating only 22 billion. DeepSeek R1 features 671 billion parameters and requires activating 37 billion. Fewer parameters mean lower operation costs.

The US stock market slumped after DeepSeek launched its R1 model on January 20. AI stock investors were shocked by DeepSeek R1's high performance and low training costs.

Media reports said DeepSeek will unveil its R2 model in May. Some AI fans expected DeepSeek R2 to have greater reasoning ability than R1 and the ability to catch up with OpenAI o4-mini.

MENAFN02052025000159011032ID1109502135



Asia Times

Legal Disclaimer:
MENAFN provides the information “as is” without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the provider above.

Search