Variational Stochastic Gradient Descent: A Probabilistic Approach to Optimizing Deep Neural Networks
The core message of this paper is to propose a novel optimizer called Variational Stochastic Gradient Descent (VSGD) that combines gradient descent with probabilistic modeling of the true gradients as latent random variables. This approach allows for more principled modeling of gradient noise and uncertainty, leading to improved optimization performance compared to existing methods like ADAM and SGD.