This paper introduces a novel approach for training verifiable safe control policies in nonlinear dynamical systems by combining deep reinforcement learning with finite-step reachability verification techniques, achieving significantly improved safety verification horizons compared to existing safe RL methods.
This paper introduces C-TRPO, a novel safe reinforcement learning algorithm that modifies the policy space geometry to ensure constraint satisfaction throughout training, achieving competitive reward maximization with fewer constraint violations compared to existing methods.
C-MCTS is a novel algorithm that enhances safety in reinforcement learning by pre-training a safety critic to guide Monte Carlo Tree Search, enabling efficient planning and constraint satisfaction in complex environments.
Enforcing safety constraints in online Linear Quadratic Regulator (LQR) learning can lead to faster learning rates, achieving regret bounds comparable to unconstrained settings, even with stronger baselines and various noise distributions.
This research paper introduces the Simple to Complex Collaborative Decision (S2CD) framework, a novel approach to training reinforcement learning agents for autonomous driving that prioritizes both safety and efficiency by leveraging knowledge transfer from a teacher model trained in a simplified environment.
This research paper introduces DOPE+, a novel algorithm for safe reinforcement learning in constrained Markov decision processes (CMDPs), which achieves an improved regret upper bound while guaranteeing no constraint violation during the learning process.
ACTSAFE is a novel model-based reinforcement learning algorithm that guarantees safe exploration in continuous action spaces by leveraging epistemic uncertainty for exploration while ensuring safety through pessimism.
Decision Points RL (DPRL) improves the safety and efficiency of batch reinforcement learning by focusing policy improvements on frequently visited state-action pairs (decision points) while deferring to the behavior policy in less explored areas.
This paper introduces a novel reinforcement learning method called SMAC (Safety Modulator Actor-Critic) that addresses safety constraints and overestimation issues in model-free settings, demonstrating its effectiveness in a UAV hovering task.
This research paper introduces a novel safe reinforcement learning framework that combines disturbance observers and residual model learning to enhance the robustness and safety of control policies in environments with internal and external disturbances.