Core Concepts
Tastle proposes a distraction-based jailbreak framework to automate red teaming of large language models, achieving superior effectiveness, scalability, and transferability.
Stats
LLMs have achieved significant advances in recent days.
Large language models (LLMs) have raised concerns about potential misuse.
Tastle achieves Top-1 attack success rates (ASR) of 66.7% and 38.0%.
Quotes
"Extensive experiments demonstrate the superiority of our framework."
"Our research aims at strengthening LLM safety instead of facilitating malicious application."