Conceitos Básicos
The author proposes Adversarial Training on Purification (AToP) as a novel defense technique to enhance both robustness and generalization in deep neural networks.
Resumo
The paper introduces AToP, a defense method combining adversarial training and purification to improve robustness. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNette show state-of-the-art results against various attacks. AToP significantly enhances the performance of the purifier model for robust classification.
Key points include:
- Vulnerability of deep neural networks to adversarial attacks.
- Limitations of existing defense techniques like adversarial training (AT) and adversarial purification (AP).
- Proposal of AToP with perturbation destruction by random transforms and purifier model fine-tuning.
- Empirical evaluation showing improved robustness and generalization against unseen attacks.
- Comparison with state-of-the-art methods across different datasets, classifiers, and attack benchmarks.
- Ablation studies demonstrating the effectiveness of AToP in enhancing the purifier model's performance for robust classification.
Estatísticas
To evaluate our method against various attacks: We utilize AutoAttack l∞ and l2 threat models (Croce & Hein, 2020).
Compared to the second-best method, our method improves the robust accuracy by 2.21% on WideResNet-28-10 and by 5.08% on WideResNet-70-16.
Citações
"Our method achieves state-of-the-art results and exhibits generalization ability against unseen attacks."
"Our method significantly improves the performance of the purifier model in robust classification."