The authors introduce Neural Exec as a new family of prompt injection attacks, demonstrating the effectiveness of autonomously generated execution triggers. By using optimization-driven methods, they show significant improvements over handcrafted triggers in evading detection and executing malicious payloads.
Large Language Models (LLMs) can be tricked into bypassing their safety mechanisms and generating harmful content through a novel attack method called Feign Agent Attack (F2A), which exploits LLMs' blind trust in security detection agents.