The paper examines the "plateau delay" phenomenon observed in gossip learning systems, where the model accuracy exhibits a delayed rapid increase compared to single-node training. Through extensive experimentation, the authors identify the root cause of this delay as the "vanishing variance" problem that occurs when averaging uncorrelated neural network models.
The key insights are:
Federated learning, where a central server aggregates models, and gossip learning variants that employ model compression or transfer learning-like strategies can effectively mitigate the plateau delay. This is because these approaches either maintain model correlation or postpone the model averaging process until models are sufficiently trained.
The authors propose a variance-corrected model averaging algorithm that rescales the weights of the averaged model to match the average variance of the contributing models. This preserves the optimal variance established by the Xavier weight initialization, addressing the vanishing variance problem.
Simulation results demonstrate that the variance-corrected approach enables gossip learning to achieve convergence efficiency comparable to federated learning, even in non-IID data settings. The method also exhibits better scalability, with up to 6x faster convergence compared to existing gossip learning techniques in large-scale networks.
The paper provides a fundamental understanding of the challenges in fully decentralized neural network training and introduces an effective solution to address the vanishing variance problem, paving the way for more efficient gossip learning systems.
翻譯成其他語言
從原文內容
arxiv.org
深入探究