Optimizing Reinforcement Learning Policies Under Stochastic Execution Delays
To address stochastic delays in reinforcement learning, it is sufficient to optimize within the set of Markov policies, which is exponentially smaller than that of history-dependent policies.