The paper begins by critiquing the Sabre defense, which claims to be 3x more robust to adversarial attacks than the current state-of-the-art. The authors identify several issues with the evaluation in the original Sabre paper, including:
The authors then demonstrate two attacks that completely break the Sabre defense. The first attack involves removing an unnecessary BPDA wrapper, which reduces the robust accuracy to 0% on both MNIST and CIFAR-10 datasets. In response, the authors modified the defense to include a new component that discretizes the input. However, the authors show that this modified defense contains a second bug, and a simple one-line change further reduces the robust accuracy to below baseline levels.
The paper concludes by discussing the broader implications of these findings, emphasizing the importance of thorough and rigorous evaluations of adversarial defenses, especially as they are being deployed in real-world production systems.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Nicholas Car... at arxiv.org 05-07-2024
https://arxiv.org/pdf/2405.03672.pdfDeeper Inquiries