核心概念
Batch Normalization (BN) can outperform Group Normalization (GN) in many federated learning (FL) settings, especially when communication frequency is low or non-IID degree is not severe. A simple practice named FIXBN is proposed to mitigate the issues of BN in FL while retaining its benefits.
摘要
The paper investigates the use of Batch Normalization (BN) and its common alternative, Group Normalization (GN), in federated learning (FL). Through an expanded empirical study, the authors find that BN can outperform GN in many FL settings, except for high-frequency communication and extreme non-IID regimes.
The authors then delve deeper into the issues with BN in FL, including the mismatch of BN statistics across non-IID clients and the deviation of gradients during local training. They propose a simple yet effective practice named FIXBN to address these issues:
In the initial exploration stage, FIXBN follows the standard practice of FEDAVG with BN, enjoying the positive impacts of BN on local training.
In the later calibration stage, FIXBN freezes the BN layer and uses the globally aggregated BN statistics for normalization, mitigating the mismatch of statistics in training and testing, and allowing FEDAVG to recover the centralized gradient under high-frequency settings.
FIXBN is easy to implement, requires no architecture change, and incurs no additional training or communication costs. Extensive experiments show that FIXBN consistently outperforms or matches the performance of BN and GN across a wide range of FL settings, including image classification on CIFAR-10 and Tiny-ImageNet, as well as image segmentation on Cityscapes.
The authors also identify another gap between FEDAVG and centralized training - the lack of maintained local SGD momentum. They show that applying maintained local or global momentum can further improve the performance of FEDAVG with different normalizers.
Overall, the study provides valuable insights and a practical solution for using BN effectively in federated deep learning, serving as a foundation for future research and applications.
統計資料
The paper does not contain any key metrics or important figures to support the author's key logics.
引述
The paper does not contain any striking quotes supporting the author's key logics.