Efficient Reinforcement Learning in Stochastic Environments Enabled by the Effective Horizon
Many stochastic Markov Decision Processes can be efficiently solved by performing only a few steps of value iteration on the random policy's Q-function and then acting greedily.