The content discusses a novel approach to Trojan attacks on Visual Question Answering (VQA) models, focusing on adapting these attacks efficiently to fine-tuned models. The proposed method involves generating Trojans that trigger specific neurons in a perturbation layer, establishing a malicious correlation with the model's outputs through adversarial learning. Extensive experiments were conducted using the VQA-v2 dataset, demonstrating enhanced performance with diverse vision and text Trojans tailored for each sample. The attack showed robustness, stealthiness, and efficiency across different metrics. Additionally, the content explores the impact of conventional defense mechanisms such as Differential Privacy and Norm Difference Estimation on the attack's performance.
Egy másik nyelvre
a forrásanyagból
arxiv.org
Mélyebb kérdések