Core Concepts
An online learning algorithm that adaptively designs a decentralized linear quadratic regulator when the system model is unknown, achieving a regret that scales sublinearly with the time horizon.
Abstract
The paper proposes an online learning algorithm for designing a decentralized linear quadratic regulator (LQR) when the system model is unknown. The key contributions are:
The algorithm uses a disturbance-feedback representation of the state-feedback controllers, which is coupled with an online convex optimization (OCO) algorithm that has memory and delayed feedback. This allows the algorithm to respect the prescribed information pattern in the decentralized setting.
Under the assumption of a stable system or a known stabilizing controller, the algorithm achieves an expected regret that scales as √T with the time horizon T for the case of partially nested information pattern. This matches the regret bound for the centralized LQR case.
For more general information patterns where the optimal decentralized controller is unknown even if the system model is known, the regret of the proposed controller is shown with respect to a linear sub-optimal controller.
The theoretical findings are validated through numerical experiments.
The key steps of the algorithm are:
In the first phase, the algorithm uses a least squares method to estimate the unknown system matrices A and B from a single system trajectory.
In the second phase, the algorithm uses an OCO algorithm with memory and delayed feedback to adaptively design a decentralized control policy, leveraging the estimated system model from the first phase.
The regret analysis shows that the proposed algorithm achieves the optimal √T regret scaling, despite the additional challenge of the decentralized information constraints.