Core Concepts
The proposed AdvDiffVLM method efficiently generates natural, unrestricted adversarial examples for visual-language models by leveraging diffusion models and adaptive ensemble gradient estimation, achieving superior transferability and robustness compared to existing transfer-based attack methods.
Abstract
The content discusses the problem of efficiently generating adversarial examples that can effectively attack visual-language models (VLMs) in targeted transfer scenarios. It first evaluates the robustness of VLMs against current state-of-the-art (SOTA) transfer-based attacks, identifying key limitations such as high computational cost, noisy adversarial examples, and limited evasion of defense methods.
To address these issues, the authors propose AdvDiffVLM, which utilizes diffusion models to generate natural, unrestricted adversarial examples. Specifically, AdvDiffVLM employs Adaptive Ensemble Gradient Estimation to modify the score during the diffusion model's reverse generation process, ensuring the adversarial examples produced contain natural adversarial semantics and thus possess enhanced transferability. Additionally, the GradCAM-guided Mask method is used to disperse adversarial semantics throughout the image, rather than concentrating them in a specific area, further improving the quality of adversarial examples.
Experimental results demonstrate that AdvDiffVLM achieves a speedup ranging from 10X to 30X compared to existing transfer-based attack methods, while maintaining superior quality of adversarial examples. The generated adversarial examples also exhibit strong transferability across VLMs and increased robustness against adversarial defense methods. Notably, AdvDiffVLM can successfully attack commercial VLMs, such as GPT-4V, in a black-box manner.
Stats
Our method achieves a speedup ranging from 10X to 30X compared to existing transfer-based attack methods.
The generated adversarial examples by AdvDiffVLM exhibit strong transferability across VLMs and increased robustness against adversarial defense methods.
Quotes
"Targeted transfer-based attacks involving adversarial examples pose a significant threat to large visual-language models (VLMs)."
"To address these issues, inspired by score matching, we introduce AdvDiffVLM, which utilizes diffusion models to generate natural, unrestricted adversarial examples."
"Experimental results demonstrate that our method achieves a speedup ranging from 10X to 30X compared to existing transfer-based attack methods, while maintaining superior quality of adversarial examples."