Continual Learning (CL) models tend to forget previous knowledge, leading to unreliable predictions. Calibration is crucial to build CL models that can provide trustworthy confidence estimates about their predictions.
Remembering Transformer employs a mixture-of-adapters and a generative model-based routing mechanism to alleviate catastrophic forgetting in continual learning by dynamically routing task data to relevant adapters.
All the hyperparameter optimization (HPO) frameworks tested, including the commonly used but unrealistic end-of-training HPO, perform similarly in terms of predictive performance. The simplest and most computationally efficient method, first-task HPO, is recommended as the preferred HPO framework for continual learning.
Bayesian Adaptive Moment Regularization (BAdam) is a novel continual learning method that unifies desirable properties of the Adam optimizer and Bayesian Gradient Descent, yielding a fast-converging approach that effectively mitigates catastrophic forgetting without relying on task labels.
Continual learning algorithms can effectively learn and adapt to a large number of tasks drawn from long-tail task distributions by maintaining and reusing optimizer states, particularly the second moments, across tasks.
A novel convolutional prompt generation mechanism coupled with a task similarity based expansion strategy for efficient and effective rehearsal-free continual learning.