The authors propose novel online algorithms, Sword and Sword++, that can achieve problem-dependent dynamic regret bounds in non-stationary environments. The bounds scale with the gradient variation and the cumulative loss of the comparator sequence, which are at most O(T) but could be much smaller in benign environments, thereby outperforming the minimax optimal rate.
This work introduces a generalization of the online convex optimization (OCO) framework that allows the loss in the current round to depend on the entire history of past decisions. It provides matching upper and lower bounds on the policy regret in terms of the time horizon and a quantitative measure of the influence of past decisions on present losses.
The paper presents feasible variants of online gradient descent (AdaOGD) and online Newton step (AdaONS) that achieve optimal regret in the single-agent setting and optimal last-iterate convergence to the unique Nash equilibrium in the multi-agent setting, without requiring any prior knowledge of problem parameters.