Core Concepts
CSAC-LB is a novel constrained RL method that achieves state-of-the-art performance without pre-training by applying a linear smoothed log barrier function to an additional safety critic.
Abstract
The article introduces CSAC-LB, a new constrained RL method that addresses optimization problems involving rewards and constraints simultaneously. It proposes a linear smoothed log barrier function to enhance policy learning and mitigate numerical issues. The algorithm achieves competitive performance on various control tasks and demonstrates robustness in real-world applications.
I. Introduction
RL traditionally focuses on rewards only.
Real-world applications often require considering multiple objectives.
Deep RL faces challenges in constrained optimization due to neural network limitations.
II. Related Work
Prior works address safety in RL using various approaches.
Safe policy search, lifelong RL, and CPO are common methods.
Lagrange multiplier methods and CVaR metrics have been applied.
III. Background
CMDPs extend MDPs with cost functions for constraints.
SAC-Lag uses Lagrange multipliers for constrained optimization.
Dual Gradient Descent updates the policy and Lagrange multiplier iteratively.
IV. Approach
Log Barrier Method transforms constrained problems into unconstrained ones.
Linear Smoothed Log Barrier Function improves numerical stability.
CSAC-LB applies the linear smoothed log barrier function to SAC with Safety Critic.
V. Results
A. Baselines
Comparison with SAC, SAC-Lag, and WCSAC on simulated tasks.
B. Simulation Experiments
Evaluation of CSAC-LB on PointGoal1-v0 environment and locomotion tasks.
C. Real-Robot Experiments
Testing policies on a Unitree A1 robot via sim-to-real transfer.
VI. Conclusion
CSAC-LB offers a general-purpose solution for constrained RL without pre-training or extensive tuning. The algorithm shows robustness in high-dimensional tasks and real-world applications, outperforming existing baselines.
Stats
CSAC-LB is proposed as an off-policy model-free method which can handle the numerical issues commonly associated with the log barrier method.
We propose CSAC-LB, an off-policy model-free method which can handle the numerical issues commonly associated with the log barrier method.