This paper introduces C-PG, a novel policy gradient framework for constrained reinforcement learning that guarantees global last-iterate convergence for both action-based and parameter-based exploration paradigms, addressing limitations of existing methods.
This paper presents a novel algorithm, achieving a significant breakthrough by efficiently computing near-optimal deterministic policies for constrained reinforcement learning (CRL) problems with time-space recursive cost criteria.
This paper introduces PD-ANPG, a novel algorithm for constrained reinforcement learning with general parameterized policies that achieves state-of-the-art sample efficiency, closing the gap between theoretical bounds and practical performance.
The paper proposes a novel method called Adversarial Constrained Policy Optimization (ACPO) for improving constrained reinforcement learning by dynamically adjusting cost budgets during training, leading to better performance in balancing reward maximization and constraint satisfaction.
This paper introduces a novel policy gradient reinforcement learning algorithm specifically designed for finite horizon constrained Markov Decision Processes (CMDPs), demonstrating its superior performance over existing infinite horizon constrained RL algorithms in time-critical scenarios.
This research paper introduces a novel switching-based reinforcement learning algorithm that guarantees the probabilistic satisfaction of temporal logic constraints throughout the learning process, balancing constraint satisfaction with reward maximization.