The paper systematically investigates the phenomenon of loss of plasticity in deep learning methods applied to continual learning problems. The key findings are:
Deep learning methods exhibit a severe and robust loss of plasticity when applied to continual learning tasks, even in simple supervised learning problems like Permuted MNIST and a continual version of ImageNet. Performance on later tasks often drops to the level of a linear network.
The loss of plasticity is accompanied by three key correlates: 1) an increase in the magnitude of network weights, 2) an increase in the fraction of "dead" units, and 3) a decrease in the effective rank of the network representation.
Existing deep learning techniques like L2-regularization, dropout, and Adam can either exacerbate or only partially mitigate the loss of plasticity. Only the shrink-and-perturb method is able to substantially reduce the loss of plasticity, but it is sensitive to hyperparameter settings.
The authors propose a new algorithm called "continual backpropagation" that robustly solves the problem of loss of plasticity by selectively reinitializing a small fraction of less-used hidden units after each example. This maintains plasticity indefinitely.
The paper provides a comprehensive and systematic demonstration of the fundamental issue of loss of plasticity in deep learning, and introduces a new algorithm that can effectively address this challenge in continual learning problems.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Shibhansh Do... lúc arxiv.org 04-11-2024
https://arxiv.org/pdf/2306.13812.pdfYêu cầu sâu hơn