Core Concepts
Calibrating the covariance matrices of new few-shot classes based on their semantic similarity to the many-shot base classes significantly improves the classification performance in few-shot class-incremental learning settings.
Abstract
The content discusses the problem of few-shot class-incremental learning (FSCIL), where the model needs to adapt to new classes from very few data (5 samples per class) without forgetting the previously learned classes.
The authors explore the use of pre-trained Vision Transformer (ViT) models for FSCIL, as opposed to the conventional ResNet architectures used in prior FSCIL methods. They identify that while higher-order feature statistics-based classification approaches like FeCAM and RanPAC work well in many-shot class-incremental learning (MSCIL) settings, they perform poorly in classifying the few-shot new classes due to poor estimates of the feature statistics from very few samples.
To address this, the authors propose to calibrate the covariance matrices of the new few-shot classes based on their semantic similarity to the many-shot base classes. They observe that classes with similar prototypes (higher cosine similarity) also have more similar covariance matrices. By exploiting this relationship, the authors calibrate the covariance matrices of the new classes using a weighted average of the base class covariances, where the weights are computed based on the prototype similarities.
The proposed covariance calibration, when used in combination with FeCAM and RanPAC, significantly improves the classification performance on the new few-shot classes, leading to better overall accuracy and harmonic mean accuracy across multiple FSCIL benchmarks.
Stats
The content does not provide any specific numerical data or metrics to support the key arguments. However, it presents several figures and tables that demonstrate the performance improvements achieved by the proposed covariance calibration method compared to the baseline approaches.