DeepSeek Reveals Technique to Enhance Large Language Models' Reasoning

2025-04-07 05:00:35

(MENAFN) Chinese AI start-up DeepSeek has unveiled a novel technique aimed at improving the reasoning capabilities of large language models (LLMs), reportedly outperforming current methods.

DeepSeek, in collaboration with researchers from Tsinghua University, developed a dual-method approach that merges generative reward modeling (GRM) with self-principled critique tuning, as reported by a Chinese news agency on Sunday.

This innovative combination is designed to enable LLMs to generate more accurate and quicker responses to general queries, as detailed in a paper published on Friday.

According to the researchers, the new DeepSeek-GRM models exceeded the performance of existing methods, achieving “competitive performance” when compared to robust public reward models. Reward modeling is a technique used to align an LLM’s behavior with human preferences.

DeepSeek has plans to release its GRM models as open source, although a specific release date has not been disclosed.

The research paper, published on the arXiv online scientific repository, has garnered increased attention toward the company's future innovations, particularly after the global spotlight on its V3 foundation model and R1 reasoning model.

MENAFN07042025000045017167ID1109397449

Legal Disclaimer:
MENAFN provides the information “as is” without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the provider above.