Automated Discovery of Powerful Deep Learning Optimizers, Decay Functions, and Learning Rate Schedules
The authors propose a new dual-joint search space for neural optimizer search (NOS) that simultaneously optimizes the weight update equation, internal decay functions, and learning rate schedules. They discover multiple optimizers, learning rate schedules, and Adam variants that outperform standard deep learning optimizers across image classification tasks.