The authors propose a new optimization algorithm called CG-like-Adam for training deep neural networks. The key aspects are:
The first-order and second-order moment estimations in the generic Adam algorithm are replaced with conjugate-gradient-like updates. The conjugate coefficient is scaled by a decreasing sequence over time to obtain the conjugate-gradient-like direction.
The convergence of the proposed CG-like-Adam algorithm is analyzed, handling cases where the exponential moving average coefficient of the first-order moment estimation is constant and the first-order moment estimation is unbiased. This extends previous convergence results.
Experiments on training VGG-19 and ResNet-34 models on the CIFAR-10 and CIFAR-100 datasets show that CG-like-Adam outperforms the standard Adam optimizer in terms of training loss, training accuracy, and testing accuracy. CG-like-Adam achieves 100% training accuracy faster and demonstrates more stable and better performance.
The authors conclude that CG-like-Adam is an effective optimization algorithm that can improve the performance of deep learning models.
翻譯成其他語言
從原文內容
arxiv.org
深入探究