核心概念
This paper proposes an adaptive online non-stochastic control algorithm (AdaFTRL-C) that achieves sublinear policy regret bounds that adapt to the difficulty of the controlled environment, as measured by the gradients of the observed cost functions.
摘要
The paper tackles the problem of Online Non-stochastic Control (NSC), where the goal is to find a policy that minimizes the cost of controlling a dynamical system with unknown disturbances and cost functions.
The key contributions are:
The authors propose the AdaFTRL-C algorithm, which uses the Follow-The-Regularized-Leader (FTRL) framework with adaptive regularizers. This allows the algorithm to achieve policy regret bounds that adapt to the difficulty of the encountered costs and disturbances, in contrast to previous non-adaptive approaches like Gradient Perturbation Controller (GPC).
The analysis of AdaFTRL-C requires new techniques to handle the coupling between the learner's actions and the system's state, which is a challenge specific to integrating FTRL with NSC. The authors show that the learner's cost can still approximate the counterfactual cost of a stationary policy, up to a diminishing error.
The adaptive regret bound of AdaFTRL-C is of the form O(sqrt(sum of gradients)), which is much tighter than the O(sqrt(T)) bound of GPC when the environment is "easy" (i.e., has small gradients). In the worst case, the bound degrades by only a constant factor compared to GPC.
Numerical experiments demonstrate the benefits of adaptivity, with AdaFTRL-C significantly outperforming GPC in easy environments, while maintaining comparable performance in the worst case.