Bayesian Neural Networks exhibit robustness to gradient-based adversarial attacks due to the averaging effect of the posterior distribution.
Adversarial examples can be characterized by their lower persistence compared to natural examples, indicating instability near decision boundaries. This is connected to the geometry of decision boundaries, which tend to have oblique angles relative to linear interpolation between natural and adversarial examples.
Defenses against adversarial examples should look beyond robustness against single attack types and instead focus on achieving robustness against multiple attacks simultaneously, handling unforeseen attacks, and enabling continual adaptation to new attacks.
MEANSPARSE, a post-processing technique for adversarially trained neural networks, enhances robustness by sparsifying mean-centered feature vectors, effectively blocking non-robust features without significantly impacting clean accuracy.
This paper introduces a novel regularization method for Convolutional Neural Networks (CNNs) that leverages pixel similarities to enhance robustness against adversarial attacks, drawing inspiration from a biologically-inspired approach that originally relied on neural recordings.
본 논문에서는 뇌의 시각 처리 메커니즘에서 영감을 얻어 인공 신경망의 적대적 robustness를 향상시키는 새로운 정규화 방법을 제시합니다. 이 방법은 기존 방법과 달리 대규모 신경 데이터를 필요로 하지 않고, 이미지 픽셀 유사도를 기반으로 학습된 표현을 정규화하여 계산 효율성을 높이면서도 뛰어난 성능을 달성합니다.
Hyper Adversarial Tuning (HyperAT) leverages shared defensive knowledge among different adversarial training methods to efficiently improve the robustness of pretrained large vision models against adversarial attacks.
本文提出了一種名為 HyperAT 的新型對抗性調整框架,透過超網路生成針對不同防禦方法的 LoRA 權重,並結合多個 LoRA 模組,有效提升預訓練大型視覺模型的對抗性穩健性,同時保持較高的計算效率。
Cleanly trained neural networks can exhibit adversarial vulnerability due to slow convergence in low-variance (off-manifold) directions when the data is inseparable in high-variance (on-manifold) directions, leading to suboptimal classifiers susceptible to attacks, and this issue can be mitigated by using second-order optimization methods.
TRADES, a widely used adversarial training method, can exhibit overestimated robustness due to gradient masking, particularly in multi-class classification tasks, highlighting the need for careful hyperparameter tuning and robust evaluation methods.