Centrala begrepp
This paper presents a compute-optimized implementation of the FedNL algorithm family for federated learning, demonstrating significant speedups over the original implementation and existing solutions for logistic regression.
Statistik
Launching FedNL experiments using the original prototype took 4.8 hours for a single optimization process.
The optimized C++ implementation achieved a ×1000 speedup in single-node simulations compared to the original Python/NumPy implementation.
In a single-node setup, the optimized FedNL/RandK[K=8d] implementation achieves a total speedup of ×929.4 compared to the baseline.
For FedNL/TopK[K=8d], the total speedup from the optimized implementation is ×1053.9 in a single-node setup.
The optimized implementation outperforms solvers from CVXPY, including CLARABEL, MOSEK, SCS, ECOS, and ECOS-BB, by ×20 in solving logistic regression.
The initialization time for CVXPY is ×7 longer than both the initialization and solving times combined for FedNL-LS with any compressor.
In a multi-node setting, FedNL surpasses Apache Spark and Ray/Scikit-Learn in both initialization time and solve time for logistic regression.
Citat
"With this level of theory development, the gain from further theoretical improvements might not be as substantial as those derived from a highly optimized implementation."
"Ready-to-use implementations play a crucial role in simplifying this challenge."
"The FedNL has a super-linear local convergence rate."
"Major ML frameworks, burdened by extensive auxiliary management code, are suboptimal for complex system and algorithm creation with high-performance requirements."