Core Concepts
Bayesian Adaptive Moment Regularization (BAdam) is a novel continual learning method that unifies desirable properties of the Adam optimizer and Bayesian Gradient Descent, yielding a fast-converging approach that effectively mitigates catastrophic forgetting without relying on task labels.
Abstract
The paper introduces Bayesian Adaptive Moment Regularization (BAdam), a novel continual learning method that combines the closed-form update rule of Bayesian Gradient Descent (BGD) with the adaptive per-parameter learning rates of the Adam optimizer.
The key insights are:
BAdam's update rule for the mean parameter (μ) is derived from Adam, which leads to faster convergence and less plasticity compared to BGD. This helps protect previously learned knowledge when learning new tasks.
The variance of each parameter (σ) is minimized when μ is at an optimal value. Since the plasticity of a parameter is controlled by σ, better optimization of μ leads to lower update rates for those parameters, reducing catastrophic forgetting.
The authors evaluate BAdam on standard continual learning benchmarks like Split MNIST and Split FashionMNIST, as well as a novel "graduated" formulation that features gradually changing task boundaries, single-epoch training, and no task labels - conditions more reflective of real-world continual learning scenarios.
Results show that BAdam achieves state-of-the-art performance for prior-based continual learning methods on the standard benchmarks, more than doubling the accuracy of previous approaches. On the more challenging graduated experiments, BAdam also outperforms all other methods, demonstrating its robustness to the additional constraints.
The authors conclude that BAdam takes important steps towards solving challenging class-incremental continual learning problems using prior-based methods, paving the way for future work in this direction.
Stats
The paper does not contain any key metrics or figures to extract.
Quotes
The paper does not contain any striking quotes to extract.