Conceptos Básicos
End-to-end learning of Koopman surrogate models using SHAC algorithm shows superior performance in eNMPC applications.
Resumen
The content introduces a method for end-to-end learning of Koopman surrogate models for optimal control performance. It contrasts standard reinforcement learning algorithms with a training algorithm that leverages differentiability in mechanistic simulation models. The method is evaluated against other controller types on an eNMPC case study, demonstrating superior performance. The paper is structured into sections discussing the introduction, method, numerical experiments, and conclusion. Key concepts include policy optimization, Koopman theory for control, Short-Horizon Actor-Critic (SHAC) algorithm, and task-optimal dynamic models for control.
Introduction:
Data-driven surrogate models reduce computational burden in economic nonlinear model predictive control (eNMPC).
End-to-end reinforcement learning of dynamic surrogate models enhances controller performance.
Policy optimization algorithms leverage derivative information from simulated environments.
Method:
Policy optimization involves Markov Decision Process representation.
Koopman theory aims to find linear representations of nonlinear dynamic systems.
SHAC algorithm uses derivative information from differentiable simulation environments.
Numerical Experiments:
Case study based on continuous stirred-tank reactor model.
Training setup compares five policy and training paradigm combinations.
Results show superior performance of Koopman-SHAC controller with minimal constraint violations.
Conclusion:
Combining Koopman-(e)NMPC controllers with SHAC algorithm demonstrates stable convergence to high rewards.
Successful proof of concept warrants further investigation on larger simulation models and challenging control problems.
Estadísticas
Recent articles have established end-to-end reinforcement learning (RL) as an alternative to system identification (SI) approach [1].
Policy gradient algorithms do not leverage analytical gradients from the environment [8].
Citas
"Policy optimization algorithms leverage derivative information from simulated environments."
"SHAC addresses challenges by shortening the learning horizon for more efficient training."