The paper starts by discussing the importance of shift invariance in visual recognition models and how existing pooling operators in CNNs, such as max pooling and average pooling, can break shift invariance. The authors introduce the concept of maximum-sampling bias (MSB), which quantifies the bias of a pooling operator to select and propagate the maximum value of the signal.
Through a large-scale correlation analysis, the authors show that MSB is negatively correlated with shift invariance across multiple CNN architectures, datasets, and pooling methods. Based on this insight, they propose a novel pooling operator called Translation Invariant Polyphase Sampling (TIPS) that discourages MSB and improves shift invariance.
TIPS learns to sample polyphase decompositions of the input feature maps using a learnable mixing function. To further improve shift invariance, the authors introduce two regularization techniques: LFM to discourage known failure modes of shift invariance, and Lundo to learn to undo standard shift transformations.
The authors evaluate TIPS on multiple image classification and semantic segmentation benchmarks, and show that it consistently outperforms previous shift-invariant pooling operators in terms of accuracy, shift consistency, and shift fidelity, while also exhibiting greater robustness to adversarial attacks and natural corruptions.
To Another Language
from source content
arxiv.org
Principais Insights Extraídos De
by Sourajit Sah... às arxiv.org 04-12-2024
https://arxiv.org/pdf/2404.07410.pdfPerguntas Mais Profundas