Core Concepts
Implicitly adversarial prompts reveal safety vulnerabilities in T2I models, necessitating continuous auditing and adaptation.
Abstract
The Adversarial Nibbler Challenge focuses on identifying safety issues in text-to-image (T2I) generative AI models through crowdsourcing implicitly adversarial prompts. The challenge aims to uncover long-tail risks often overlooked in standard testing by engaging diverse populations to generate images with safety violations. Key highlights include:
Importance of evaluating model robustness against non-obvious attacks.
Building a diverse dataset of implicitly adversarial prompts to expose safety vulnerabilities.
Novel attack strategies identified through human creativity.
Challenges in measuring vulnerability of T2I models to implicit attacks.
Recommendations for red-teaming efforts and benchmarking T2I model safety using Nibbler.
Stats
14%の画像が機械によって「安全」と誤ラベル付けされる。