Efficient Gauss-Newton Temporal Difference Learning for Nonlinear Function Approximation in Reinforcement Learning
The paper proposes a Gauss-Newton Temporal Difference (GNTD) learning method to solve the Q-learning problem with nonlinear function approximation. GNTD takes one Gauss-Newton step to optimize a variant of Mean-Squared Bellman Error, achieving improved sample complexity compared to existing temporal difference methods.