The paper investigates the use of Batch Normalization (BN) and its common alternative, Group Normalization (GN), in federated learning (FL). Through an expanded empirical study, the authors find that BN can outperform GN in many FL settings, except for high-frequency communication and extreme non-IID regimes.
The authors then delve deeper into the issues with BN in FL, including the mismatch of BN statistics across non-IID clients and the deviation of gradients during local training. They propose a simple yet effective practice named FIXBN to address these issues:
FIXBN is easy to implement, requires no architecture change, and incurs no additional training or communication costs. Extensive experiments show that FIXBN consistently outperforms or matches the performance of BN and GN across a wide range of FL settings, including image classification on CIFAR-10 and Tiny-ImageNet, as well as image segmentation on Cityscapes.
The authors also identify another gap between FEDAVG and centralized training - the lack of maintained local SGD momentum. They show that applying maintained local or global momentum can further improve the performance of FEDAVG with different normalizers.
Overall, the study provides valuable insights and a practical solution for using BN effectively in federated deep learning, serving as a foundation for future research and applications.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Jike Zhong,H... at arxiv.org 04-01-2024
https://arxiv.org/pdf/2303.06530.pdfDeeper Inquiries