The paper introduces the framework of Stochastic Execution Delay Markov Decision Processes (SED-MDPs) to model environments where actions are executed with random delays. It establishes a key theoretical finding: when the delay realizations are observed, it is sufficient to optimize within the class of Markov policies to achieve optimal performance, rather than history-dependent policies.
Based on this insight, the authors devise Delayed EfficientZero (DEZ), a model-based algorithm that builds upon the EfficientZero framework. DEZ maintains separate queues to track past actions and their delays, using them to accurately predict future states and make decisions accordingly.
The authors thoroughly evaluate DEZ on the Atari suite, considering both constant and stochastic delay settings. Their results show that DEZ significantly outperforms the baseline methods, including the previous state-of-the-art 'Delayed-Q' algorithm, in both delay scenarios.
Para Outro Idioma
do conteúdo original
arxiv.org
Principais Insights Extraídos De
by David Valens... às arxiv.org 04-09-2024
https://arxiv.org/pdf/2404.05440.pdfPerguntas Mais Profundas