Core Concepts
The Hellinger-UCB algorithm leverages the squared Hellinger distance to construct an upper confidence bound that achieves the theoretical lower bound for regret in stochastic multi-armed bandit problems. It also provides a closed-form solution for the case of binomial reward distributions, making it suitable for low-latency applications such as cold start recommendation systems.
Abstract
The paper presents the Hellinger-UCB algorithm, a novel variant of the Upper Confidence Bound (UCB) algorithm for the stochastic multi-armed bandit (MAB) problem. The key idea is to use the squared Hellinger distance to build the upper confidence bound, instead of the commonly used Kullback-Leibler (KL) divergence.
The authors prove that the Hellinger-UCB algorithm achieves the theoretical lower bound for regret in the stochastic MAB problem. They also show that the Hellinger-UCB has a solid statistical interpretation and provides a closed-form solution for the case of binomial reward distributions.
The paper includes numerical experiments comparing Hellinger-UCB with other UCB variants, demonstrating its superior performance in finite time horizons. As a real-world application, the authors apply Hellinger-UCB to solve the cold-start problem in a content recommender system, where it outperforms both KL-UCB and UCB1 in terms of click-through rate (CTR).
The key highlights and insights from the paper are:
Hellinger-UCB leverages the squared Hellinger distance to construct the upper confidence bound, which has favorable mathematical properties and statistical interpretation compared to KL divergence.
Hellinger-UCB achieves the theoretical lower bound for regret in the stochastic MAB problem.
For the case of binomial reward distributions, Hellinger-UCB has a closed-form solution, which is a desirable property for low-latency applications like cold start recommendation systems.
Numerical experiments show that Hellinger-UCB outperforms other UCB variants in both simulated and real-world settings, particularly in the cold start recommendation problem.
Stats
The paper does not provide any specific numerical data or statistics to support the key claims. The results are presented through plots and high-level comparisons.
Quotes
"Hellinger-UCB reaches the theoretical lower bound."
"Hellinger-UCB has a solid statistical interpretation."
"Hellinger-UCB outperforms both KL-UCB and UCB1 in the sense of a higher click-through rate (CTR)."