This paper proposes a new class of parameterized controllers that resemble a Quadratic Programming (QP) solver of a linear Model Predictive Control (MPC) problem, with the parameters of the controller being trained via Deep Reinforcement Learning (DRL) rather than derived from system models. This approach addresses the limitations of common controllers with Multi-Layer Perceptron (MLP) or other general neural network architecture used in DRL, in terms of verifiability and performance guarantees, while empirically matching MPC and MLP controllers in terms of control performance and exhibiting superior robustness and computational efficiency.
Growing Q-Networks (GQN) adaptively increases control resolution from coarse to fine within decoupled Q-learning, reconciling the exploration benefits of coarse discretization during early training with the need for smooth control at convergence.
Decision Transformer can effectively learn control policies for partially observable nonlinear dynamical systems, exhibiting remarkable zero-shot generalization and rapid adaptation to new tasks with minimal demonstrations.
Extremum-Seeking Action Selection (ESA) improves the quality of exploratory actions in policy optimization, reducing the sampling of low-value trajectories and accelerating learning.
Introducing an integral term into quadratic-type reward functions in reinforcement learning can effectively diminish steady-state errors without causing significant spikes in system states.