The paper proposes AdaGossip, a novel decentralized learning algorithm that adaptively adjusts the consensus step-size based on the compressed model differences between neighboring agents. The key idea is that a higher error in the received neighbors' parameters due to compression requires a lower consensus step-size for that parameter. AdaGossip computes individual adaptive consensus step-size for different parameters from the estimates of second moments of the gossip-error.
The authors extend AdaGossip to decentralized machine learning, resulting in AdaG-SGD. Through extensive experiments on various datasets, model architectures, compressors, and graph topologies, the authors demonstrate that AdaG-SGD outperforms the current state-of-the-art CHOCO-SGD by 0-2% in test accuracy. The improvements are more prominent in larger graph structures and challenging datasets like ImageNet.
The paper also discusses the limitations of the proposed method, including the assumption of a doubly stochastic and symmetric mixing matrix, the need to tune the consensus step-size hyperparameter, and the additional memory and computation required to estimate the second raw moment of gossip-error.
Na inny język
z treści źródłowej
arxiv.org
Głębsze pytania