핵심 개념
By only locally tuning and globally sharing a small portion of the model weights, FedPEFT can significantly reduce the total communication overhead while maintaining competitive or even better performance in a wide range of federated learning scenarios.
초록
The paper introduces FedPEFT, a new federated learning framework that simultaneously addresses data heterogeneity and communication challenges. FedPEFT enables the leveraging of strong pre-trained models in FL while maintaining an extremely low communication cost.
The key highlights and insights are:
Communication Analysis:
FedPEFT methods achieve better results compared to full fine-tuning and other baselines, even with significantly fewer communicated parameters.
Full fine-tuning requires orders of magnitude more communication to achieve comparable results with FedPEFT.
FedPEFT-Bias stands out as the most efficient prototype.
Capability Analysis:
As the domain gap between the pre-training dataset and downstream task increases, full fine-tuning falls further behind FedPEFT, unable to keep up despite a massive communication budget.
FedPEFT approaches can suitably adapt the upstream representations without excessively damaging them, with FedPEFT-Prompt showing the strongest robustness.
FedPEFT achieves comparable results to full fine-tuning with less than 0.3% of the trainable parameters across various federated learning settings.
Robustness Analysis:
Under differential privacy constraints, full fine-tuning experiences the sharpest drop in performance, falling lower than all FedPEFT prototypes.
In low-data regimes, FedPEFT outperforms full fine-tuning and head-tuning, revealing its capability to appropriately adapt pre-trained representations.
Overall, FedPEFT provides a new paradigm for practical and effective federated learning systems by enabling the use of large pre-trained models while significantly reducing communication costs.
통계
The communication cost per communication round for full fine-tuning is even higher than the total communication cost for FedPEFT to converge to similar final server accuracy.
FedPEFT reduces the size of communication each round from 328MB (85.88M parameters)/Client to 0.68MB (0.17M parameters)/Client when using a pre-trained ViT-Base as the backbone.
인용구
"By only locally tuning and globally sharing a small portion of the model weights, significant reductions in the total communication overhead can be achieved while maintaining competitive or even better performance in a wide range of federated learning scenarios, providing insight into a new paradigm for practical and effective federated systems."