Hellinger-UCB: An Optimal Algorithm for Stochastic Multi-Armed Bandit Problems and Cold Start Recommendation Systems
The Hellinger-UCB algorithm leverages the squared Hellinger distance to construct an upper confidence bound that achieves the theoretical lower bound for regret in stochastic multi-armed bandit problems. It also provides a closed-form solution for the case of binomial reward distributions, making it suitable for low-latency applications such as cold start recommendation systems.