Core Concepts
The paper presents feasible variants of online gradient descent (AdaOGD) and online Newton step (AdaONS) that achieve optimal regret in the single-agent setting and optimal last-iterate convergence to the unique Nash equilibrium in the multi-agent setting, without requiring any prior knowledge of problem parameters.
Abstract
The paper focuses on the problem of online learning with gradient feedback, where an agent interfaces with an environment by choosing an action at each period and receives a cost function and gradient feedback. The standard metric for judging the performance of an online learning algorithm is regret, which measures the difference between the total cost incurred by the algorithm and the total cost incurred by the best fixed action in hindsight.
The paper presents two main contributions:
Feasible Variant of OGD (AdaOGD):
AdaOGD is a variant of online gradient descent (OGD) that does not require knowing the strong convexity or strong monotonicity parameters.
In the single-agent setting with strongly convex cost functions, AdaOGD achieves a near-optimal regret of O(log^2(T)).
In the multi-agent setting of strongly monotone games, if each agent employs AdaOGD, the joint action converges to the unique Nash equilibrium at a near-optimal last-iterate rate of O(log^3(T)/T).
Feasible Variant of ONS (AdaONS):
AdaONS is a variant of online Newton step (ONS) that does not require knowing the exp-concavity parameter.
In the single-agent setting with exp-concave cost functions, AdaONS achieves a near-optimal regret of O(d log^2(T)).
The paper also introduces a new class of exp-concave games and shows that if each agent employs AdaONS, the time-average of the joint action converges to the unique Nash equilibrium at a near-optimal rate of O(d log^2(T)/T).
The key to the adaptivity of both AdaOGD and AdaONS is a simple and unifying randomized strategy that selects the step size based on a set of independent and identically distributed geometric random variables. This allows the algorithms to be feasible and doubly optimal, in contrast to previous work that required knowing the problem parameters.