מושגי ליבה
The author explores the sensitivity of text prompts in enhancing adversarial robustness for Vision-Language Models. By proposing Adversarial Prompt Tuning (APT), they demonstrate significant improvements in accuracy and robustness by simply adding one learned word to prompts.
תקציר
The content delves into the importance of text prompts in improving adversarial robustness for Vision-Language Models. The proposed method, APT, showcases remarkable enhancements in accuracy and robustness across various datasets and data sparsity schemes. By analyzing the impact of different prompting strategies, the study highlights a novel approach to bolstering model resilience against adversarial attacks.
Large pre-trained Vision-Language Models (VLMs) like CLIP are vulnerable to adversarial examples despite their generalization ability. The study focuses on text prompts' influence on adversarial attack and defense, introducing APT as an effective method to enhance model robustness. Results show significant performance boosts with APT compared to hand-engineered prompts and other adaptation methods.
The research emphasizes the critical role of text prompts in improving VLMs' resilience to adversarial attacks. Through APT, a single learned word added to prompts leads to substantial accuracy and robustness improvements across multiple datasets and training scenarios.
סטטיסטיקה
Surprisingly, by simply adding one learned word to the prompts, APT can significantly boost the accuracy and robustness (ϵ = 4/255) over the hand-engineered prompts by +13% and +8.5% on average respectively.
Extensive experiments are conducted across 15 datasets and 4 data sparsity schemes (from 1-shot to full training data settings) to show APT’s superiority over hand-engineered prompts and other state-of-the-art adaptation methods.
For each dataset, our method shows improvement on all of them but the margin varies considerably.
Our method yields substantial improvement over HEP even for 1 shot, effectively boosting accuracy and robustness over HEP.
The UC variant of our method effectively boosts the accuracy and robustness over HEP even for 1 shot, i.e., +6.1% and +3.0% for accuracy and robustness respectively.
ציטוטים
"By simply adding one learned word to the prompts, APT can significantly boost the accuracy and robustness."
"Our method yields substantial improvement over HEP even for 1 shot."
"The UC variant of our method effectively boosts the accuracy and robustness over HEP."