Core Concepts
The core message of this paper is to propose a coupled distributed stochastic approximation algorithm to solve a distributed optimization problem with unknown parameters, and provide a comprehensive convergence rate analysis that quantifies the influence of network properties, heterogeneity of agents, and initial states on the algorithm performance.
Abstract
The paper considers a distributed optimization problem where each agent has access to its local computational function and parameter learning function, but the parameters are unknown. To address this, the authors propose a Coupled Distributed Stochastic Approximation (CDSA) algorithm, in which every agent updates its current beliefs of the unknown parameter and decision variable using stochastic approximation, and then averages the beliefs and decision variables of its neighbors over the network.
The key highlights and insights are:
The authors prove that the mean-squared error of the decision variable is bounded by O(1/nk) + O(1/√n(1-ρw)k^1.5) + O(1/(1-ρw)^2k^2), where k is the iteration count and (1-ρw) is the spectral gap of the network weighted adjacency matrix. This reveals that the network connectivity characterized by (1-ρw) only influences the high order of convergence rate, while the domain rate still acts the same as the centralized algorithm.
The authors analyze the transient time KT needed for the proposed algorithm to reach its dominant rate, showing that when k ≥ KT, the dominant factor is related to stochastic gradient descent, while for small k < KT, the main factor is from the distributed average consensus method. They demonstrate that the algorithm asymptotically achieves the same network-independent convergence rate as the centralized scheme.
Numerical experiments are carried out to validate the theoretical results by taking different CPUs as agents, which is more applicable to real-world distributed scenarios.
Stats
The mean-squared error of the decision variable is bounded by O(1/nk) + O(1/√n(1-ρw)k^1.5) + O(1/(1-ρw)^2k^2).
The transient time KT needed for the proposed algorithm to reach its dominant rate is O(n/(1-ρw)^2).
Quotes
"The network connectivity characterized by (1-ρw) only influences the high order of convergence rate, while the domain rate still acts the same as the centralized algorithm."
"The algorithm asymptotically achieves the same network-independent convergence rate as the centralized scheme."