Chinese AI start-up DeepSeek has introduced a new way to improve the reasoning capabilities of large language models (LLMs) to deliver better and faster results to general queries than its competitors.
DeepSeek sparked a frenzy in January when it came onto the scene with R1, an artificial intelligence (AI) model and chatbot that the company claimed was cheaper and performed just as well as OpenAI’s rival ChatGPT model.
Collaborating with researchers from China’s Tsinghua University, DeepSeek said in its latest paper released on Friday that it had developed a technique for self-improving AI models.
The underlying technology is called self-principled critique tuning (SPCT), which trains AI to develop its own rules for judging content and then uses those rules to provide detailed critiques.
It gets better results by running several evaluations simultaneously rather than using larger models.
This approach is known as generative reward modeling (GRM), a machine learning system that checks and rates what AI models produce, making sure they match what humans ask with SPCT.
How does it work?
Usually, improving AI requires making models bigger during training, which takes a lot of human effort and computing power. Instead, DeepSeek has created a system with a built-in “judge” that evaluates the AI’s answers in real-time.
When you ask a question, this judge compares the AI’s planned response against both the AI’s core rules and what a good answer should look like.
If there’s a close match, the AI gets positive feedback, which helps it improve.
DeepSeek calls this self-improving system “DeepSeek-GRM”. The researchers said this would help models perform better than competitors like Google’s Gemini, Meta’s Llama, and OpenAI’s GPT-4o.
DeepSeek plans to make these advanced AI models available as open-source software, but no timeline has been given.
The paper’s release comes as rumours swirl that DeepSeek is set to unveil its latest R2 chatbot. But the company has not commented publicly on any such new release.