The content delves into the importance of text prompts in improving adversarial robustness for Vision-Language Models. The proposed method, APT, showcases remarkable enhancements in accuracy and robustness across various datasets and data sparsity schemes. By analyzing the impact of different prompting strategies, the study highlights a novel approach to bolstering model resilience against adversarial attacks.
Large pre-trained Vision-Language Models (VLMs) like CLIP are vulnerable to adversarial examples despite their generalization ability. The study focuses on text prompts' influence on adversarial attack and defense, introducing APT as an effective method to enhance model robustness. Results show significant performance boosts with APT compared to hand-engineered prompts and other adaptation methods.
The research emphasizes the critical role of text prompts in improving VLMs' resilience to adversarial attacks. Through APT, a single learned word added to prompts leads to substantial accuracy and robustness improvements across multiple datasets and training scenarios.
Til et annet språk
fra kildeinnhold
arxiv.org
Viktige innsikter hentet fra
by Lin Li,Haoya... klokken arxiv.org 03-05-2024
https://arxiv.org/pdf/2403.01849.pdfDypere Spørsmål