Direct Adaptive Learning of the Linear Quadratic Regulator from Online Closed-Loop Data
This paper proposes a novel data-enabled policy optimization (DeePO) method for direct adaptive learning of the linear quadratic regulator (LQR) from online closed-loop data. The method uses a new policy parameterization based on the sample covariance, which enables efficient use of data and equivalence to the certainty-equivalence LQR. DeePO achieves global convergence via a projected gradient dominance property, and provides non-asymptotic guarantees showing sublinear regret decay and bias scaling with signal-to-noise ratio.