XGrad: Boosting Gradient-Based Optimizers with Weight Prediction
XGrad introduces weight prediction into popular gradient-based optimizers like SGD with momentum, Adam, AdamW, AdaBelief, and AdaM3 to boost their convergence and generalization when training deep neural network models.