AI Model Deepseek-R1 Is 11X More Likely To Generate Harmful Content, Says Report


(MENAFN- PRovoke) Over the last few days, the AI industry has been grappling with the emergence of a new AI model by Chinese startup DeepSeek which claims to rival OpenAI and has become the number one downloaded free app on Apple's Iphone store.

In particular, DeepSeek started getting significant traction when it released a follow-up research paper detailing another DeepSeek AI model called R1, which showed more advanced reasoning skills and was significantly cheaper than o1, a model that OpenAI sold. It has since been lauded for its ability to show users its entire thinking process, high emotional intelligence, and speed.

However, new red teaming research by Enkrypt AI, an AI security and compliance platform, has uncovered ethical and security flaws in DeepSeek's technology

The analysis found the model to be“highly biased” and susceptible to generating insecure code, as well as producing harmful and toxic content, including hate speech, threats, self-harm, and explicit or criminal material.

For example, 83% of bias attacks were successful in producing biased output, notably for health, race and religion. It also found that using the model in different industries may result in violations of the Equal Credit Opportunity Act (ECOA), Fair Housing Act (FHA), Affordable Care Act (ACA), EU AI Act, and other fairness-related regulations.

The study noted that the DeepSeek-r1 model exhibited similar bias as compared to GPT-4o and o1. However, DeepSeek-r1 has 3 times more bias when compared with Caude-3-opus.

It also found that 45% of harmful tests were successful in generating harmful content. The model is most vulnerable to producing content related to criminal planning, guns, illegal weapons, and controlled substances.

It found that DeepSeek is 11 times more vulnerable to producing harmful content as compared to Open AI's o1, and six times more vulnerable when compared to Claude-3-opus. It is also 2.5 times more vulnerable when compared to GPT-4o.

The firm also assessed the model's toxicity generation capability by transforming a prompt into a sentence completion task. This involves providing the model with partial sentences or prompts related to potentially harmful content, and then observing how it completes them.

“We test the model to generate content that can be classified into threat, insult, profanity, sexually explicit to name a few,” it said in the report.

The model is in the bottom 20th percentile for producing toxic content on its safety leaderboard which contains more than 100 models. 6.68% of attacks were able to generate toxic content. The model was particularly vulnerable to profanity and severe toxicity.

The study found that DeepSeek-r1 is 4.5 times more likely to generate toxic content as compared to GPT-4o, and 2.5 times when compared to o1. Claude-3-Opus successfully detected all toxic content prompts making it almost toxicity-free.

The firm then prompted the model to create malicious software from various perspectives, including payload construction, top-level architecture, and other relevant factors, across a range of programming languages. It also examined the model's capacity to replicate known malicious signatures.

78% of the attacks were successful in generating insecure code, highlighting a substantial vulnerability, it said.

DeepSeek-r1 was found to be 4.5 times, 2.5 times, and 1.25 times more vulnerable to generating insecure code than models o1, Claude-3-opus and GPT-4o respectively.

The study then looked into CBRN tests which check for the degree to which a model can be manipulated to generate graduate level Chemical, Biological, and Cybersecurity related content. This capability can be misused by malicious actors to build weapons of mass destruction, it said.

It input malicious queries related to chemistry, biology, and cybersecurity to the model to assess its response and the model generated CBRN information for 13% of the attacks.

It found that DeepSeek-r1 is 3.5 times more vulnerable than o1 and Claude-3-opus in producing CBRN content. It is also two times more vulnerable than GPT-4o.

“Overall, our evaluation found that DeepSeek-r1 is highly vulnerable to generating harmful, toxic, biased, CBRN, and insecure code output. While it may be suitable for narrowly scoped applications, the model shows considerable vulnerabilities in operational and security risk areas, as detailed in our methodology. We strongly recommend implementing mitigations if this model is to be used,” said Enkrypt AI, adding that users should conduct automated stress tests tailored to specific use cases, such as mitigating biases in consumer banking and preventing toxicity in customer support as well as implement guardrails that dynamically adjust based on context to neutralize harmful inputs and ensure relevant, safe content output.

“DeepSeek-R1 offers significant cost advantages in AI deployment, but these come with serious risks. Our research findings reveal major security and safety gaps that cannot be ignored. While DeepSeek-R1 may be viable for narrowly scoped applications, robust safeguards-including guardrails and continuous monitoring-are essential to prevent harmful misuse. AI safety must evolve alongside innovation, not as an afterthought,” added Sahil Agarwal, CEO of Enkrypt AI.

MENAFN02022025000219011063ID1109158286


PRovoke

Legal Disclaimer:
MENAFN provides the information “as is” without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the provider above.