AdamW implicitly performs constrained optimization under the ℓ∞ norm constraint, converging to the KKT points of the constrained problem.