Jia, X., Gao, S., Guo, Q., Ma, K., Huang, Y., Qin, S., Liu, Y., Tsang, I., & Cao, X. (2024). Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack. IEEE Transactions on Pattern Analysis and Machine Intelligence.
This paper addresses the vulnerability of Vision-Language Pre-training (VLP) models to adversarial attacks by proposing a novel method, SA-AET, to generate highly transferable adversarial examples that can effectively fool unseen VLP models.
The researchers developed SA-AET, which enhances adversarial example diversity by sampling from adversarial evolution triangles composed of clean, historical, and current adversarial examples. They also introduce a semantic image-text feature contrast space to reduce feature redundancy and improve semantic alignment, further boosting transferability. The method is evaluated on benchmark datasets like Flickr30K, MSCOCO, and RefCOCO+ using various VLP models, including CLIP (CLIPCNN and CLIPViT), ALBEF, and TCL.
SA-AET significantly improves the transferability of multimodal adversarial examples compared to existing methods. Sampling from specific adversarial evolution sub-triangles, particularly those near clean and previous adversarial examples, further enhances transferability. Generating adversarial examples in the semantic image-text feature contrast space also contributes to increased effectiveness.
SA-AET effectively generates highly transferable adversarial examples against VLP models, demonstrating the vulnerability of these models and highlighting the need for more robust VLP model development. The proposed techniques of adversarial evolution triangle sampling and semantic contrast space optimization significantly contribute to the method's efficacy.
This research significantly contributes to the field of adversarial machine learning by proposing a novel and effective method for generating highly transferable adversarial examples against VLP models. It highlights the vulnerability of current VLP models and emphasizes the importance of developing more robust and secure VLP models for real-world applications.
The research primarily focuses on transferability in image-text retrieval tasks. Future work could explore the generalization of SA-AET to other VLP downstream tasks and investigate its effectiveness against more complex VLP architectures. Additionally, exploring defense mechanisms against such transferable attacks is crucial for developing more resilient VLP models.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Xiaojun Jia,... at arxiv.org 11-06-2024
https://arxiv.org/pdf/2411.02669.pdfDeeper Inquiries