Core Concepts
The core message of this article is to propose a class of federated primal-dual policy optimization methods to solve federated reinforcement learning (FedRL) problems with constraint heterogeneity, where different agents have access to different constraint signals and the learning goal is to find an optimal policy satisfying all constraints.
Abstract
The article formulates the FedRL problem with constraint heterogeneity as a seven-tuple ⟨S, A, r, {(ci, di)}N
i=1, γ, P, {Γi}N
i=1⟩, where N agents share the same state space, action space, reward function, discounted factor, and transition dynamics, but have access to different constraint signals.
To address the challenge of constraint heterogeneity, the authors propose a class of federated primal-dual policy optimization methods. The key ideas are:
Decompose the original Lagrange function into N local Lagrange functions, so that each agent can perform primal-dual updates based on its local constraint function.
Require periodic communication among agents to aggregate their locally updated policies, in order to find a policy satisfying all constraints.
The authors instantiate two specific algorithms, FedNPG and FedPPO, which respectively use natural policy gradient and proximal policy optimization as the policy optimization method.
For FedNPG, the authors provide a theoretical analysis and show that it achieves an ˜O(1/√T) global convergence rate. For FedPPO, the authors empirically evaluate its performance on various non-tabular FedRL tasks, demonstrating its ability to find optimal policies satisfying heterogeneous constraints.