The author explores the efficiency of slowly changing adversarial bandit algorithms in discounted Markov Decision Processes, showing that optimal regret can be achieved. The approach involves a reduction from tabular reinforcement learning to multi-armed bandits.


coremsg

efficiency-of-slowly-changing-adversarial-bandit-algorithms-for-discounted-mdps


Efficiency of Slowly Changing Adversarial Bandit Algorithms for Discounted MDPs


title_rewrite


Slowly changing adversarial bandit algorithms can efficiently handle discounted Markov decision processes.


slowly-changing-adversarial-bandit-algorithms-are-efficient-for-discounted-mdps


Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPs