Core Concepts
Designing new convolutional operations that are naturally invariant to arbitrary rotations, without relying on data augmentation, to enhance the rotation invariance of deep neural networks.
Abstract
The paper addresses the challenge of achieving rotation invariance in deep neural networks (DNNs), particularly convolutional neural networks (CNNs), which lack inherent rotation invariance. It proposes a set of new rotation-invariant convolution (RIConv) operations that are designed using various non-learnable operators, such as gradient, sorting, local binary pattern, and maximum operations.
The key highlights are:
The proposed RIConvs have the same number of learnable parameters as their corresponding traditional convolution operations, and their computational processes are similar, allowing them to be interchangeable.
Experiments on the MNIST-Rot dataset show that two types of RIConvs based on gradient operators (SB-Conv and GD-Conv) achieve state-of-the-art results in terms of rotation invariance, outperforming previous rotation-invariant CNN models.
Integrating the proposed RIConvs with classical CNN backbones (VGG, Inception, ResNet, DenseNet) significantly improves their performance on texture recognition, aircraft classification, and remote sensing image classification tasks, especially when the training data is limited.
Even with data augmentation, the RIConvs can further enhance the performance of CNN models, demonstrating the importance of mechanism-assured rotation invariance in feature learning.
Stats
The MNIST-Rot dataset contains 360,000 rotated test images, with rotation angles from 0° to 360° in 10° increments.
The Outex_TC_00012 dataset has 1,440 training images and 7,680 test images of 24 different texture categories.
The MTARSI dataset has 4,000 training images and 5,385 test images of 20 different aircraft types.
The NWPU-RESISC45 dataset has 18,000 training images and 13,500 test images of 45 remote sensing scene categories.
Quotes
"Achieving rotation invariance in deep neural networks without relying on data has always been a hot research topic."
"Intrinsic rotation invariance can enhance the model's feature representation capability, enabling better performance in tasks such as multi-orientation object recognition and detection."
"The most ideal scenario would be to design new convolution operations that can be interchanged with conventional convolution operation without increasing the number of learnable parameters and naturally exhibits invariance to arbitrary rotations."