핵심 개념
Federated reinforcement learning (FRL) can expedite the process of learning near-optimal policies for agents operating in heterogeneous environments by leveraging collaborative information from other agents.
초록
This paper introduces FedSARSA, a novel federated on-policy reinforcement learning algorithm that integrates the classic SARSA algorithm with a federated learning framework. The key contributions are:
Heterogeneity in FRL Optimal Policies: The paper formulates an FRL planning problem where agents operate in heterogeneous environments, leading to heterogeneity in their optimal policies. It provides an explicit bound on this heterogeneity, validating the benefits of collaboration.
Finite-Time Error Analysis of FedSARSA: The paper establishes a finite-time error bound for FedSARSA, achieving state-of-the-art sample complexity. This is the first provably sample-efficient on-policy algorithm for FRL problems.
Convergence Region and Linear Speedups: The paper shows that FedSARSA exponentially converges to a small region containing agents' optimal policies, whose radius tightens as the number of agents grows. For a linearly decaying step-size, the learning process enjoys linear speedups through federated collaboration.
The analysis tackles several key challenges, including time-varying behavior policies, environmental heterogeneity, multiple local updates, and continuous state-action spaces with linear function approximation. The theoretical findings are validated through numerical simulations.