Comprehensive Survey on Adversarial Attacks and Defenses for Generative Language Models
This paper provides a comprehensive survey on the rapidly growing field of red teaming for generative language models, covering the full pipeline from risk taxonomy, attack strategies, evaluation metrics, and benchmarks to defensive approaches.