Concepts de base
This work introduces a generalization of the online convex optimization (OCO) framework that allows the loss in the current round to depend on the entire history of past decisions. It provides matching upper and lower bounds on the policy regret in terms of the time horizon and a quantitative measure of the influence of past decisions on present losses.
Résumé
The paper introduces a generalization of the online convex optimization (OCO) framework, called "Online Convex Optimization with Unbounded Memory", that captures long-term dependence of the current loss on past decisions.
Key highlights:
- Defines the notion of p-effective memory capacity (Hp) that quantifies the maximum influence of past decisions on present losses.
- Proves an O(√(HpT)) upper bound on the policy regret and a matching (worst-case) lower bound.
- As a special case, proves the first non-trivial lower bound for OCO with finite memory, and improves existing upper bounds.
- Demonstrates the broad applicability of the framework by deriving regret bounds for online linear control and an online variant of performative prediction.
The paper first formalizes the problem setup, making assumptions about the feedback model, loss functions, and the dynamics of the history space. It then presents two algorithms, one using follow-the-regularized-leader (FTRL) and another combining FTRL with a mini-batching approach.
The key technical contributions are:
- Defining the notion of p-effective memory capacity (Hp) and using it to derive tight upper and lower bounds on the policy regret.
- Specializing the results to the case of OCO with finite memory, proving the first non-trivial lower bound and improving existing upper bounds.
- Applying the framework to two diverse problems - online linear control and online performative prediction - and deriving improved regret bounds.
The paper concludes by discussing future research directions, including extensions to unknown dynamics and bandit feedback settings.
Stats
The paper does not contain any explicit numerical data or statistics. It focuses on theoretical analysis and regret bounds.
Citations
"In many applications the loss of the learner depends not only on the current decisions but on the entire history of decisions until that point."
"We introduce the notion of p-effective memory capacity, Hp, that quantifies the maximum influence of past decisions on present losses."
"We prove an O(√(HpT)) upper bound on the policy regret and a matching (worst-case) lower bound."