The content delves into the importance of text prompts in improving adversarial robustness for Vision-Language Models. The proposed method, APT, showcases remarkable enhancements in accuracy and robustness across various datasets and data sparsity schemes. By analyzing the impact of different prompting strategies, the study highlights a novel approach to bolstering model resilience against adversarial attacks.
Large pre-trained Vision-Language Models (VLMs) like CLIP are vulnerable to adversarial examples despite their generalization ability. The study focuses on text prompts' influence on adversarial attack and defense, introducing APT as an effective method to enhance model robustness. Results show significant performance boosts with APT compared to hand-engineered prompts and other adaptation methods.
The research emphasizes the critical role of text prompts in improving VLMs' resilience to adversarial attacks. Through APT, a single learned word added to prompts leads to substantial accuracy and robustness improvements across multiple datasets and training scenarios.
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by Lin Li,Haoya... alle arxiv.org 03-05-2024
https://arxiv.org/pdf/2403.01849.pdfDomande più approfondite