Sparse Activations and Adaptive Optimizers Enable Effective Continual Learning in a Simple MLP
Combining sparse activation functions like Hard Adaptive SwisH (Hard ASH) with adaptive learning rate optimizers like Adagrad can enable a simple MLP to perform well in class incremental continual learning tasks, without requiring specialized continual learning algorithms.