The content delves into the importance of text prompts in improving adversarial robustness for Vision-Language Models. The proposed method, APT, showcases remarkable enhancements in accuracy and robustness across various datasets and data sparsity schemes. By analyzing the impact of different prompting strategies, the study highlights a novel approach to bolstering model resilience against adversarial attacks.
Large pre-trained Vision-Language Models (VLMs) like CLIP are vulnerable to adversarial examples despite their generalization ability. The study focuses on text prompts' influence on adversarial attack and defense, introducing APT as an effective method to enhance model robustness. Results show significant performance boosts with APT compared to hand-engineered prompts and other adaptation methods.
The research emphasizes the critical role of text prompts in improving VLMs' resilience to adversarial attacks. Through APT, a single learned word added to prompts leads to substantial accuracy and robustness improvements across multiple datasets and training scenarios.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Lin Li,Haoya... lúc arxiv.org 03-05-2024
https://arxiv.org/pdf/2403.01849.pdfYêu cầu sâu hơn