Core Concepts
Global momentum compression (GMC) is a novel method that combines error feedback and global momentum to achieve sparse communication in distributed learning, outperforming existing local momentum-based methods.
Abstract
The content discusses a novel method called global momentum compression (GMC) for sparse communication in distributed learning. The key highlights are:
Existing sparse communication methods in distributed learning, such as Deep Gradient Compression (DGC), use local momentum, which only accumulates stochastic gradients computed by each worker locally. In contrast, GMC utilizes global momentum, which contains global information from all workers.
To enhance the convergence performance when using more aggressive sparsification compressors like Random Blockwise Gradient Sparsification (RBGS), the authors extend GMC to GMC+, which introduces global momentum to the detached error feedback technique.
The authors provide theoretical convergence analysis for both GMC and GMC+, proving that they can achieve the same convergence rate as vanilla distributed momentum SGD (DMSGD) under certain assumptions.
Empirical results on image classification tasks demonstrate that GMC and GMC+ can achieve higher test accuracy and faster convergence compared to existing local momentum-based methods, especially under non-IID data distribution.
Stats
The communication cost on the server for DMSGD is 2dKT, where d is the model parameter dimension, T is the number of iterations, and K is the number of workers.
The relative communication cost (RCC) of GMC is calculated as the ratio of its communication cost on the server to that of DMSGD.