แนวคิดหลัก
A novel dual-network training framework, The Victim and The Beneficiary (V&B), can effectively train clean models on poisoned data without requiring benign samples.
บทคัดย่อ
The content discusses a novel secure training framework called The Victim and The Beneficiary (V&B) to defend against backdoor attacks on deep neural networks (DNNs).
The key highlights are:
Backdoor attacks pose a serious security threat to the training process of DNNs, where attackers inject a designed trigger into a few benign samples to force the model to learn the correlation between the trigger and a target label.
The authors find that the entropy of the poisoned model's prediction can be used to distinguish poisoned samples from benign ones. This inspires them to propose the V&B framework.
In the V&B framework, the Victim network is first trained on suspicious samples (i.e., samples with low prediction entropy) to become a powerful poisoned sample detector. Then the Beneficiary network is trained on credible samples (i.e., samples with high prediction entropy) filtered by the Victim network.
To further improve the Beneficiary network and erase potential backdoors, a semi-supervised suppression strategy is adopted, where the Victim network's knowledge is used to relabel and suppress the suspicious samples.
The authors also propose a strong data augmentation method called AttentionMix, which mixes the influential image regions according to the attention map to effectively inhibit backdoor injection.
Extensive experiments on CIFAR-10 and ImageNet datasets demonstrate the effectiveness and robustness of the V&B framework against various state-of-the-art backdoor attacks.
สถิติ
The average prediction entropy of benign samples is significantly higher than that of poisoned samples crafted by 6 backdoor attacks on CIFAR-10 with ResNet-18 under a poisoning rate of 10%.