المفاهيم الأساسية
Proposing a defense method using backtranslation to protect LLMs from jailbreaking attacks.
الإحصائيات
"Our defense significantly outperforms the baselines."
"Our defense achieves superior defense success rate against adversarial prompts."
"Our defense is cheap and efficient."
اقتباسات
"We propose a new method for defending LLMs against jailbreaking attacks by 'backtranslation'."
"Our defense significantly outperforms the baselines."
"Our defense is highly effective for defending against existing jailbreak attacks."