Core Concepts
The core message of this article is to propose an efficient online continuous hyperparameter tuning framework called Continuous Dynamic Tuning (CDT) for contextual bandits, which can automatically learn the optimal hyperparameter configuration in practice without requiring a pre-defined candidate set. The authors also introduce a novel Zooming TS algorithm with Restarts to handle the non-stationary Lipschitz bandit problem underlying the hyperparameter tuning task, and provide theoretical guarantees on the regret bounds.
Abstract
The article presents an online continuous hyperparameter optimization framework called Continuous Dynamic Tuning (CDT) for contextual bandits. The key contributions are:
Formulating the hyperparameter optimization as a non-stationary Lipschitz continuum-armed bandit problem, where each arm represents a hyperparameter configuration and the corresponding reward is the algorithmic performance.
Proposing the Zooming TS algorithm with Restarts to efficiently solve this Lipschitz bandit problem under the switching environment, which adaptively refines the hyperparameter space and utilizes Thompson Sampling for exploration. A restart technique is introduced to handle the piecewise changes in the bandit environment.
Providing theoretical guarantees on the regret bounds of the proposed CDT framework, showing it can achieve sublinear regret under mild assumptions.
Demonstrating through experiments on both synthetic and real datasets that the CDT framework consistently outperforms existing hyperparameter tuning methods for various contextual bandit algorithms.
The article first reviews the problem setting of contextual bandits and the limitations of existing hyperparameter tuning approaches. It then introduces the Zooming TS algorithm with Restarts for the Lipschitz bandit problem under the switching environment, and integrates it into the CDT framework for online hyperparameter optimization of contextual bandits. Rigorous theoretical analysis and extensive empirical evaluations are provided to validate the effectiveness of the proposed method.
Stats
The article does not contain any explicit numerical data or statistics. The experimental results are presented in the form of cumulative regret curves.