핵심 개념
Designing control policies that minimize regret robustly over randomly sampled system parameters can be achieved through semidefinite programming, providing strong probabilistic out-of-sample regret guarantees.
초록
The content discusses regret optimal control for uncertain stochastic systems using scenario optimization. It introduces a competitive framework focusing on minimizing regret relative to a clairvoyant optimal policy. The method proposed involves sampling uncertainty instances and solving a semidefinite program to compute the policy with robust regret minimization. The approach extends to include safety constraints with high probability, showcasing improved closed-loop performance across various system dynamics.
I. INTRODUCTION
- Regret minimization in control systems.
- Competitive framework for designing efficient control laws.
- Importance of minimizing loss relative to an optimal policy.
II. PROBLEM STATEMENT AND PRELIMINARIES
- Description of uncertain linear time-varying dynamical systems.
- Formulation of robust regret minimization problem.
- Linear disturbance feedback policy for causality enforcement.
III. MAIN RESULTS
- Solution to robust regret minimization problem based on scenario optimization.
- Semidefinite programming approach for computing the policy.
- Strong probabilistic out-of-sample regret guarantees demonstrated through numerical simulations.
IV. NUMERICAL RESULTS
- Validation of theoretical results through numerical experiments.
- Comparison between exact and approximate solutions in terms of performance guarantees and computation times.
- Illustration of improved closed-loop performance using regret minimization approach.
V. CONCLUSION
- Novel method for convex synthesis of robust control policies with provable regret and safety guarantees.
- Potential applications in adapting to heterogeneous dynamics and disturbances.
통계
"Research supported by the Swiss National Science Foundation (SNSF) under the NCCR Automation (grant agreement 51NF40 80545)."
"Mass m = 1 kg, spring constant k = 1 N m−1, damping constant c = 1 N m−1."
"Sampling time Ts = 1 s."
"Uniform distribution: δk ∼ U[−0.2,0.2] and δc ∼ U[−0.2,0.2]."
인용구
"We prove that this policy optimization problem can be solved through semidefinite programming."
"Our method naturally extends to include satisfaction of safety constraints with high probability."