skip to content
Decrypt LOL

Get Cyber-Smart in Just 5 Minutes a Week

Decrypt delivers quick and insightful updates on cybersecurity. No spam, no data sharing—just the info you need to stay secure.

Read the latest edition

New AI Jailbreak Method Increases Attack Success Rates

/ 1 min read

🧠🔍 New jailbreak technique exposes vulnerabilities in large language models. Researchers from Palo Alto Networks’ Unit 42 have identified a novel attack method, dubbed “Bad Likert Judge,” that exploits large language models (LLMs) to bypass safety measures and generate harmful content. This multi-turn strategy involves using the Likert scale to evaluate the harmfulness of responses, allowing the LLM to produce examples that align with higher harmfulness scores. Tests across six leading LLMs showed that this technique can increase the attack success rate by over 60% compared to standard prompts. The findings underscore the importance of robust content filtering to mitigate risks associated with deploying LLMs in real-world applications, especially in light of recent reports on AI’s susceptibility to manipulation.

Source
{entry.data.source.title}
Original