Regularizing the norm of natural input gradients can achieve near state-of-the-art adversarial robustness on ImageNet, with significantly lower computational cost than adversarial training. The effectiveness of this approach critically depends on the smoothness of the activation functions used in the model architecture.


coremsg

characterizing-and-improving-model-robustness-via-natural-input-gradients


Characterizing and Improving Model Robustness via Natural Input Gradients


title_rewrite


The core message of this paper is that by mixing the output probabilities of a standard neural network classifier and a robust neural network classifier, the accuracy-robustness trade-off can be significantly alleviated, achieving high clean accuracy while maintaining strong adversarial robustness.


improving-accuracy-robustness-trade-off-of-neural-classifiers-via-adaptive-mixing-of-standard-and-robust-models


Improving Accuracy-Robustness Trade-off of Neural Classifiers via Adaptive Mixing of Standard and Robust Models