skip to content
Decrypt LOL

Get Cyber-Smart in Just 5 Minutes a Week

Decrypt delivers quick and insightful updates on cybersecurity. No spam, no data sharing—just the info you need to stay secure.

Read the latest edition

New Jailbreak Technique Threatens Large Language Model Security

/ 1 min read

🧠💔 New jailbreak technique poses risks to large language models’ security. Researchers from Palo Alto Networks’ Unit 42 have identified a novel attack method called the Bad Likert Judge, which enables attackers to bypass cybersecurity measures in large language models (LLMs) like those from OpenAI and Google. This technique involves prompting the LLM to evaluate responses based on a Likert scale, ultimately leading to the generation of harmful content. Tests showed that this method increased the attack success rate by over 60% compared to standard prompts. The researchers emphasize the importance of implementing robust content-filtering systems to mitigate these risks, as they can significantly reduce the likelihood of harmful outputs from LLMs.

Source
{entry.data.source.title}
Original