Efficient Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement Learning with Provable Convergence
The proposed single-loop deep actor-critic (SLDAC) algorithm can efficiently solve constrained reinforcement learning problems with non-convex stochastic constraints and high interaction cost, while provably converging to a Karush-Kuhn-Tucker (KKT) point.