The author proposes an improved algorithm for adversarial linear mixture MDPs, focusing on unknown transitions and bandit feedback, achieving a regret bound that surpasses previous results.
Prioritized League Reinforcement Learning addresses challenges in large-scale heterogeneous multiagent systems by promoting cooperation and resolving sample inequality.
Effiziente Exploration in Low-Rank MDPs durch das VoX-Algorithmus.
Linear mixture MDPs algorithm improvement for adversarial settings.
Verbesserter Algorithmus für adversative lineare Misch-MDPs mit Bandit-Feedback und unbekanntem Übergang.
Slowly changing adversarial bandit algorithms can efficiently handle discounted Markov decision processes.