The paper introduces a new logarithmic step size for the stochastic gradient descent (SGD) algorithm with warm restarts. The key highlights are:
The new logarithmic step size exhibits slower convergence to zero compared to many existing step sizes, yet converges faster than the cosine step size. This leads to a higher probability of selecting points from the final iterations compared to the cosine step size.
For the new logarithmic step size, the authors establish a convergence rate of O(1/√T) for smooth non-convex functions, which matches the best-known convergence rate for such functions.
Extensive experiments are conducted on the FashionMNIST, CIFAR10, and CIFAR100 datasets, comparing the new logarithmic step size with 9 other popular step size methods. The results demonstrate the effectiveness of the new step size, particularly on the CIFAR100 dataset where it achieves a 0.9% improvement in test accuracy over the cosine step size when using a convolutional neural network model.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by M. Soheil Sh... at arxiv.org 04-02-2024
https://arxiv.org/pdf/2404.01257.pdfDeeper Inquiries