Tastle: Distract Large Language Models for Automatic Jailbreak Attack
The author proposes Tastle, a distraction-based framework to automate red teaming of large language models by generating jailbreak prompts. The approach is motivated by the distractibility and over-confidence phenomenon of LLMs.