Core Concepts
The core message of this paper is that incrementally tuning the shared adapter without imposing parameter update constraints is an effective continual learning strategy for pre-trained vision transformers, and further performance improvements can be achieved by retraining a unified classifier with semantic shift-compensated prototypes.
Abstract
The paper proposes a continual learning framework for pre-trained vision transformers (ViTransformers) that addresses the challenge of class-incremental learning (CIL). The key insights are:
Incremental adapter tuning: The authors observe that incrementally tuning the shared adapter is superior to other parameter-efficient tuning (PET) methods like prompt-tuning and SSF-tuning in the CIL setting. Adapter tuning effectively balances learning new classes and maintaining performance on old classes, without the need to construct an adapter pool.
Unconstrained adapter tuning: The authors analyze that constraining parameter updates to mitigate forgetting can hinder the plasticity required for learning new classes, due to the similarity in parameter sensitivity across tasks. Therefore, they propose to incrementally tune the shared adapter without any parameter update constraints.
Semantic shift estimation and unified classifier retraining: Since the feature distribution of old classes changes over time due to incremental training, the authors propose to estimate the semantic shift of old prototypes without access to past samples. They then use the updated prototypes to retrain a unified classifier, further improving performance.
The proposed method, called SSIAT, eliminates the need for model expansion and retaining any image samples. Extensive experiments on five CIL benchmarks demonstrate the superiority of SSIAT, achieving state-of-the-art performance.
Stats
The average accuracy of the last incremental session on ImageNetR is 79.38%.
The average accuracy across all incremental sessions on ImageNetR is 83.63%.
The average accuracy of the last incremental session on ImageNetA is 62.43%.
The average accuracy across all incremental sessions on ImageNetA is 70.83%.
Quotes
"Confining the change of parameters of previous tasks hinders the plasticity of new classes due to the similarity of parameter sensitivity among tasks."
"We estimate the semantic shift of old prototypes without access to past samples and update stored prototypes session by session."