The paper proposes AttackBench, a comprehensive evaluation framework for gradient-based adversarial attacks on machine learning models. It makes the following key contributions:
Categorization of Gradient-based Attacks: The paper presents an original categorization of gradient-based attacks, identifying their main components and differences. This unifies the various formulations in the literature.
AttackBench Framework: The authors introduce the AttackBench framework, which evaluates the effectiveness and efficiency of adversarial attacks in a fair and consistent manner. It tests the attacks against the same models, data, and computational budget.
Optimality Metric: The paper proposes a novel "optimality" metric that quantifies how close an attack is to the optimal solution. This enables a fair ranking of the attacks.
Extensive Evaluation: The authors extensively test over 100 attack implementations with more than 800 configurations against CIFAR-10 and ImageNet models. This analysis highlights that only a few attacks outperform the competing approaches.
Insights on Implementation Issues: The evaluation uncovers several implementation issues that prevent many attacks from finding better solutions or running correctly. This could lead to a re-evaluation of the state-of-the-art in adversarial attacks.
Open-source Benchmark: AttackBench is released as a publicly-available benchmark to continuously evaluate novel gradient-based attacks for optimizing adversarial examples.
Til et andet sprog
fra kildeindhold
arxiv.org
Vigtigste indsigter udtrukket fra
by Anto... kl. arxiv.org 05-01-2024
https://arxiv.org/pdf/2404.19460.pdfDybere Forespørgsler