Основні поняття
EasyJailbreak introduces a modular framework simplifying jailbreak attacks against Large Language Models, revealing vulnerabilities and emphasizing the need for enhanced security measures.
Статистика
"Our validation across 10 distinct LLMs reveals a significant vulnerability, with an average breach probability of 60% under various jailbreaking attacks."
"Notably, even advanced models like GPT-3.5-Turbo and GPT-4 exhibit average Attack Success Rates (ASR) of 57% and 33%, respectively."