Upper Counterfactual Confidence Bounds: Optimistic Algorithms for Contextual Bandits
The author introduces the concept of Upper Counterfactual Confidence Bounds (UCCB) as a new optimism principle for contextual bandits, providing provably optimal and computationally efficient algorithms. The approach focuses on building confidence bounds in policy space rather than action space, addressing challenges with general function classes and large context spaces.