Efficient Gradient-Based Optimization for the Discounted Discrete-Time Linear Quadratic Regulator with Unknown Parameters
This paper proposes a new algorithm that provably achieves ε-optimality for the discounted discrete-time Linear Quadratic Regulator (LQR) problem with unknown parameters, using only O(1/ε) function evaluations, without relying on two-point gradient estimates.