Khái niệm cốt lõi
A lightweight defense mechanism, PAD-FT, that effectively disinfects poisoned deep neural network models without requiring additional clean data.
Tóm tắt
The paper proposes a novel lightweight post-training backdoor defense mechanism called PAD-FT. The key components of PAD-FT are:
-
Data Purification:
- Employs symmetric cross-entropy (SCE) loss to identify and select the most-likely clean data from the poisoned training dataset, creating a self-purified clean dataset without external data.
-
Activation Clipping:
- Optimizes activation clipping bounds using the self-purified clean dataset to mitigate the impact of backdoor triggers on activation values.
-
Classifier Fine-Tuning:
- Fine-tunes only the classifier layer of the victim model using the self-purified clean dataset and consistency regularization, significantly reducing computational cost compared to fine-tuning the entire model.
Extensive experiments on CIFAR-10 and CIFAR-100 datasets demonstrate the effectiveness and superiority of PAD-FT against various backdoor attack strategies, including BadNets, Blended, and WaNet, across different poison rates. PAD-FT maintains a strong balance between classification accuracy and attack success rate, outperforming state-of-the-art defense mechanisms.
Thống kê
The paper does not provide any specific numerical data or statistics in the main text. The results are presented in tabular format, showing the classification accuracy (ACC) and attack success rate (ASR) for different defense mechanisms and attack scenarios.
Trích dẫn
The paper does not contain any direct quotes that are particularly striking or support the key arguments.