Core Concepts
The core message of this work is to provide a finite sample analysis of an active learning algorithm for identifying nonlinear dynamical systems in a control-oriented manner, achieving optimal rates up to logarithmic factors.
Abstract
This paper introduces and analyzes the Active Learning for Control-Oriented Identification (ALCOI) algorithm, which extends previous work on active learning for model-based reinforcement learning to handle general nonlinear dynamical systems. The key contributions are:
Finite sample analysis of ALCOI: The authors derive finite sample bounds on the excess control cost achieved by the ALCOI algorithm, which characterize the interplay between the hardness of control and the hardness of identification. These bounds are shown to be tight up to logarithmic factors in the setting of linear-in-the-parameters nonlinear systems.
Novel non-asymptotic system identification result: The authors provide a new non-asymptotic result characterizing the parameter estimation error in terms of the Fisher Information matrix, which may be of independent interest.
Two-stage exploration strategy: ALCOI uses a two-stage exploration strategy, where an initial coarse parameter estimate is first obtained, which is then used to design a targeted exploration policy that approximately solves an optimal experiment design problem.
Numerical validation: The authors demonstrate the benefits of ALCOI over random exploration and approximate A-optimal experiment design on an illustrative nonlinear control problem.
The key technical innovation is the use of recent advances in the non-asymptotic analysis of nonlinear system identification to derive end-to-end control guarantees for model-based reinforcement learning in general nonlinear settings.