Q-Learning for Stochastic Control under General Information Structures and Non-Markovian Environments
Presenting a convergence theorem for stochastic iterations, particularly Q-learning, under general, possibly non-Markovian, stochastic environments.