The paper introduces the framework of Stochastic Execution Delay Markov Decision Processes (SED-MDPs) to model environments where actions are executed with random delays. It establishes a key theoretical finding: when the delay realizations are observed, it is sufficient to optimize within the class of Markov policies to achieve optimal performance, rather than history-dependent policies.
Based on this insight, the authors devise Delayed EfficientZero (DEZ), a model-based algorithm that builds upon the EfficientZero framework. DEZ maintains separate queues to track past actions and their delays, using them to accurately predict future states and make decisions accordingly.
The authors thoroughly evaluate DEZ on the Atari suite, considering both constant and stochastic delay settings. Their results show that DEZ significantly outperforms the baseline methods, including the previous state-of-the-art 'Delayed-Q' algorithm, in both delay scenarios.
A otro idioma
del contenido fuente
arxiv.org
Ideas clave extraídas de
by David Valens... a las arxiv.org 04-09-2024
https://arxiv.org/pdf/2404.05440.pdfConsultas más profundas