Kernkonzepte
Adaptive gradient methods in OTA FL enhance robustness by adjusting stepsize, impacting convergence rates.
Zusammenfassung
The article proposes AdaGrad and Adam in OTA FL, analyzes convergence rates under channel fading and interference. Experiments validate theoretical findings, showing ADOTA-FL outperforms FedAvgM. System model, adaptive gradient updating, and convergence analysis detailed.
- Introduction
- Federated learning preserves privacy in model training.
- FL system stages: parameter uploading, aggregation, model update, and local training.
- FL benefits: data privacy, global model access, and training efficiency.
- Main Contributions
- Adaptive gradient methods update stepsize based on historical gradients.
- AdaGrad in OTA FL faces challenges from channel disturbances.
- Systematic approach to integrate adaptive methods in A-OTA FL.
- Convergence Analysis
- AdaGrad-OTA converges at O(ln(T)/T^(1-1/α)).
- Adam-OTA converges faster at O(1/T).
- Heavy-tailed interference affects convergence rates significantly.
- Simulation Results
- Performance comparison with baseline FedAvgM.
- Experiments on ResNet-18/34, logistic regression on CIFAR-10, CIFAR-100, EMNIST datasets.
- ADOTA-FL outperforms baseline in convergence and generalization.
- Effects of Hyper-parameters
- β2 impacts convergence rate in Adam-OTA.
- Well-chosen β2 accelerates convergence.
- Effects of System Parameters
- Performance improves with increased number of clients.
- Training performance slows with higher data heterogeneity.
Statistiken
AdaGrad-OTA konvergiert mit O(ln(T)/T^(1-1/α)).
Adam-OTA konvergiert mit O(1/T).
Zitate
"AdaGrad-OTA konvergiert mit O(ln(T)/T^(1-1/α))."
"Adam-OTA konvergiert mit O(1/T)."