This paper addresses the challenges of training individual policies with environmental heterogeneity in federated reinforcement learning (FedRL). The authors formulate the problem setup of FedRL in heterogeneous environments, where each agent learns a localized policy for its designated MDP.
The paper explores various aggregation schemes for selecting a subset of agents to aggregate their value functions during the federated update process. The authors introduce the CAESAR aggregation scheme, which combines convergence-aware sampling with a selective screening mechanism.
The convergence-aware sampling step identifies probable "peers" - agents learning in the same MDP - by assessing the convergence trends of their value functions. The selective screening process then refines the selected agents, prioritizing those with superior performance. This dual-layered approach of CAESAR effectively mitigates the risk of suboptimal peer selection, enhancing the learning efficiency of agents in their respective MDPs.
The authors validate the effectiveness of CAESAR through experiments in a custom-built GridWorld environment and the FrozenLake-v1 task, each presenting varying levels of environmental heterogeneity. CAESAR demonstrates robust performance across diverse scenarios, outperforming other aggregation schemes, including the hypothetical "Peers" approach that assumes prior knowledge of agent-MDP assignments.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問