Robust Constrained Reinforcement Learning for Handling Model Mismatch
The goal is to learn a policy that maximizes the worst-case reward while satisfying a constraint on the worst-case utility, even under model mismatch between the training and real environments.