The authors propose a new optimization algorithm called CG-like-Adam for training deep neural networks. The key aspects are:
The first-order and second-order moment estimations in the generic Adam algorithm are replaced with conjugate-gradient-like updates. The conjugate coefficient is scaled by a decreasing sequence over time to obtain the conjugate-gradient-like direction.
The convergence of the proposed CG-like-Adam algorithm is analyzed, handling cases where the exponential moving average coefficient of the first-order moment estimation is constant and the first-order moment estimation is unbiased. This extends previous convergence results.
Experiments on training VGG-19 and ResNet-34 models on the CIFAR-10 and CIFAR-100 datasets show that CG-like-Adam outperforms the standard Adam optimizer in terms of training loss, training accuracy, and testing accuracy. CG-like-Adam achieves 100% training accuracy faster and demonstrates more stable and better performance.
The authors conclude that CG-like-Adam is an effective optimization algorithm that can improve the performance of deep learning models.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Jiawu Tian,L... at arxiv.org 04-03-2024
https://arxiv.org/pdf/2404.01714.pdfDeeper Inquiries