The authors propose two novel approaches, parameter-wise mGEM (p-mGEM) and data-wise mGEM (d-mGEM), that more flexibly constrain the update direction in Gradient Episodic Memory (GEM) to achieve a better trade-off between remembering old and learning new knowledge.
다국어 음성 인식 모델을 새로운 언어로 지속적으로 학습할 때 발생하는 문제를 해결하기 위해 자기회귀 디코더 부분에 대한 4가지 최적화 기법을 제안하였다.
The core message of this article is that by modeling the parameter transitions along sequential tasks with a low-rank weight matrix transformation and leveraging Hessian information to automatically determine the perturbation ranks, the proposed Hessian Aware Low-Rank Perturbation (HALRP) algorithm can effectively mitigate catastrophic forgetting, control model growth, and achieve superior performance in continual learning scenarios.
Collaborative learning can significantly improve the plasticity of continual learners in online settings, thereby enhancing their overall performance.
A simple yet flexible continual flatness optimization method, C-Flat, is proposed to improve the generalization ability of continual learning models by inducing flatter loss landscapes.
InfLoRA injects a small number of parameters to reparameterize the pre-trained weights and shows that fine-tuning these injected parameters is equivalent to fine-tuning the pre-trained weights within a subspace. Furthermore, InfLoRA designs this subspace to eliminate the interference of the new task on the old tasks, making a good trade-off between stability and plasticity.