The paper proposes AdaGossip, a novel decentralized learning algorithm that adaptively adjusts the consensus step-size based on the compressed model differences between neighboring agents. The key idea is that a higher error in the received neighbors' parameters due to compression requires a lower consensus step-size for that parameter. AdaGossip computes individual adaptive consensus step-size for different parameters from the estimates of second moments of the gossip-error.
The authors extend AdaGossip to decentralized machine learning, resulting in AdaG-SGD. Through extensive experiments on various datasets, model architectures, compressors, and graph topologies, the authors demonstrate that AdaG-SGD outperforms the current state-of-the-art CHOCO-SGD by 0-2% in test accuracy. The improvements are more prominent in larger graph structures and challenging datasets like ImageNet.
The paper also discusses the limitations of the proposed method, including the assumption of a doubly stochastic and symmetric mixing matrix, the need to tune the consensus step-size hyperparameter, and the additional memory and computation required to estimate the second raw moment of gossip-error.
Para Outro Idioma
do conteúdo original
arxiv.org
Principais Insights Extraídos De
by Sai Aparna A... às arxiv.org 04-10-2024
https://arxiv.org/pdf/2404.05919.pdfPerguntas Mais Profundas