Sophia is a second-order optimizer that achieves a 2x speed-up compared to Adam in training language models, reducing time, cost, and compute resources.
Sophia is a second-order optimizer that achieves a 2x speed-up compared to Adam in training language models, reducing time and cost significantly.
The authors introduce Sophia, a second-order optimizer for language model pre-training that achieves significant speed-ups compared to Adam. Sophia adapts efficiently to heterogeneous curvatures and uses a light-weight diagonal Hessian estimate.