Vulnerability of Neural Network Interpretations to Universal Adversarial Perturbations
Neural network interpretations using gradient-based saliency maps are susceptible to universal adversarial perturbations that can significantly alter the interpretation across a large fraction of input samples.