Core Concepts
A spatially optimized compact deep metric learning model that utilizes a single layer of involution feature extractor alongside a compact convolution model to significantly enhance the performance of similarity search while maintaining a small model size.
Abstract
The content presents a deep metric learning model that combines involution and convolution layers to improve the performance of similarity search tasks. The key highlights are:
The proposed model consists of a single involution layer followed by 4 convolution layers. The involution layer captures global spatial relations, while the convolution layers enhance the feature representation.
The model uses Gaussian Error Linear Unit (GELU) activation function instead of ReLU to provide a more gradual transition between activation states and better retain image distance metrics.
The model is trained using Categorical Cross-Entropy (CE) loss and Multi-Similarity (MS) loss to optimize for both classification and pair-wise similarity.
Experiments on CIFAR-10, FashionMNIST, and MNIST datasets show that the proposed hybrid model outperforms vanilla convolution and involution-based models, while being significantly smaller in size (less than 1 MB).
Compared to larger and deeper models like ResNet50V2, the proposed model achieves similar performance with much fewer parameters, making it suitable for real-world implementations.
The authors discuss how multiple involution layers can lead to information loss and redundancy, especially for more diverse datasets like CIFAR-10, and a single involution layer is found to be optimal.
Stats
The model has around 116,000 weight parameters and a size of less than 1 MB.
Quotes
"Involution addresses this challenge by employing a dynamic kernel while remaining lightweight. Moreover, in applications where spatial context is necessary, involution often performs well even as an addition to convolution."
"Our proposed method is simple to implement yet effective unlike other hybrid models of involution and convolution."
"Only ResNet50V2 performs well here but with 23 Million weight parameters; ours performs similarly with around 100 thousand weight parameters."