Online Convex Optimization with Unbounded Memory: Regret Bounds and Applications
This work introduces a generalization of the online convex optimization (OCO) framework that allows the loss in the current round to depend on the entire history of past decisions. It provides matching upper and lower bounds on the policy regret in terms of the time horizon and a quantitative measure of the influence of past decisions on present losses.