Core Concepts
The core message of this paper is to establish the optimal sample complexity of multi-distribution learning paradigms, such as collaborative learning, group distributionally robust optimization, and agnostic federated learning. The authors show that their algorithms can achieve this optimal sample complexity by learning to sample from data distributions on demand.
Abstract
The paper presents a general framework for obtaining optimal and on-demand sample complexity for three multi-distribution learning settings: collaborative learning, group distributionally robust optimization (group DRO), and agnostic federated learning.
Key highlights:
The authors frame multi-distribution learning as a stochastic zero-sum game between a minimizing player (the learner) and a maximizing player (the auditor). This allows them to leverage no-regret game dynamics to efficiently find an approximate min-max equilibrium.
The main technical challenge is that the maximizing player's payoff is more costly to estimate than the minimizing player's, due to the need to sample from multiple data distributions. The authors overcome this by using stochastic mirror descent to optimally trade off the players' asymmetric needs for datapoints.
For collaborative learning, the authors provide randomized and deterministic models that achieve optimal sample complexity, improving upon previous results.
For group DRO, the authors provide the first sample complexity bounds, as well as high-probability bounds on the convergence of the training error.
For agnostic federated learning, the authors show that on-demand sampling can accelerate generalization by a factor of n compared to batch results.
The authors' results demonstrate that multi-distribution learning can be solved efficiently using on-demand sampling, with only an additive increase in sample complexity over learning a single distribution.