Using the Projected Forward Gradient in the Frank-Wolfe algorithm improves memory efficiency and convergence rates in deep neural network training.