The content delves into the importance of text prompts in improving adversarial robustness for Vision-Language Models. The proposed method, APT, showcases remarkable enhancements in accuracy and robustness across various datasets and data sparsity schemes. By analyzing the impact of different prompting strategies, the study highlights a novel approach to bolstering model resilience against adversarial attacks.
Large pre-trained Vision-Language Models (VLMs) like CLIP are vulnerable to adversarial examples despite their generalization ability. The study focuses on text prompts' influence on adversarial attack and defense, introducing APT as an effective method to enhance model robustness. Results show significant performance boosts with APT compared to hand-engineered prompts and other adaptation methods.
The research emphasizes the critical role of text prompts in improving VLMs' resilience to adversarial attacks. Through APT, a single learned word added to prompts leads to substantial accuracy and robustness improvements across multiple datasets and training scenarios.
Para outro idioma
do conteúdo fonte
arxiv.org
Principais Insights Extraídos De
by Lin Li,Haoya... às arxiv.org 03-05-2024
https://arxiv.org/pdf/2403.01849.pdfPerguntas Mais Profundas