The paper systematically investigates the phenomenon of loss of plasticity in deep learning methods applied to continual learning problems. The key findings are:
Deep learning methods exhibit a severe and robust loss of plasticity when applied to continual learning tasks, even in simple supervised learning problems like Permuted MNIST and a continual version of ImageNet. Performance on later tasks often drops to the level of a linear network.
The loss of plasticity is accompanied by three key correlates: 1) an increase in the magnitude of network weights, 2) an increase in the fraction of "dead" units, and 3) a decrease in the effective rank of the network representation.
Existing deep learning techniques like L2-regularization, dropout, and Adam can either exacerbate or only partially mitigate the loss of plasticity. Only the shrink-and-perturb method is able to substantially reduce the loss of plasticity, but it is sensitive to hyperparameter settings.
The authors propose a new algorithm called "continual backpropagation" that robustly solves the problem of loss of plasticity by selectively reinitializing a small fraction of less-used hidden units after each example. This maintains plasticity indefinitely.
The paper provides a comprehensive and systematic demonstration of the fundamental issue of loss of plasticity in deep learning, and introduces a new algorithm that can effectively address this challenge in continual learning problems.
翻译成其他语言
从原文生成
arxiv.org
更深入的查询