The author establishes regret lower bounds for contextual bandits with graph feedback, introducing a graph-theoretical quantity to characterize the learning limit. Algorithms achieving near-optimal regrets are provided for different context sequences and feedback graphs.
Contextual bandits with graph feedback present fundamental learning limits characterized by graph-theoretical quantities.