This paper introduces a new class of adaptive learning rate optimizers, called Nlar optimizers, that dynamically estimate both learning rates and momentum using non-linear autoregressive time-series models, demonstrating robust convergence and strong initial adaptability compared to traditional methods like Adam.
Optimierung des adaptiven Lernraten für Follow-the-Regularized-Leader zur Erreichung optimaler Ergebnisse.
Prodigy is an algorithm that improves the convergence rate of D-Adaptation, providing a state-of-the-art solution for learning rate adaptation.