New Jailbreak Technique Threatens Large Language Model Security • Decrypt LOL

🧠💔 New jailbreak technique poses risks to large language models’ security. Researchers from Palo Alto Networks’ Unit 42 have identified a novel attack method called the Bad Likert Judge, which enables attackers to bypass cybersecurity measures in large language models (LLMs) like those from OpenAI and Google. This technique involves prompting the LLM to evaluate responses based on a Likert scale, ultimately leading to the generation of harmful content. Tests showed that this method increased the attack success rate by over 60% compared to standard prompts. The researchers emphasize the importance of implementing robust content-filtering systems to mitigate these risks, as they can significantly reduce the likelihood of harmful outputs from LLMs.

Source

Original