Konsep Inti
Federated optimization can achieve communication reduction by exploiting similarities between client functions using regularized drift correction.
Abstrak
The paper presents a framework called DANE+ that generalizes the DANE algorithm for distributed optimization with regularized drift correction. DANE+ allows the use of inexact local solvers and arbitrary control variates, and has more freedom to choose how to aggregate the local updates. The authors show that DANE+ can achieve deterministic communication reduction across different settings (strongly-convex, convex, non-convex) by exploiting the similarities between client functions, as measured by the Averaged Hessian Dissimilarity (δA) and Bounded Hessian Dissimilarity (δB).
The authors also propose a novel framework called FedRed that employs doubly regularized drift correction. FedRed enjoys the same communication reduction as DANE+ but has improved local computational complexity. Specifically, when using gradient descent as the local solver, FedRed may require fewer communication rounds than vanilla gradient descent without incurring additional computational overhead.
The key insights are:
Regularized drift correction is the key mechanism that allows communication reduction when the client functions exhibit certain similarities.
DANE, an established method in distributed optimization, already implicitly uses this technique and can achieve communication reduction in terms of δA.
By adding an additional regularizer, FedRed can further improve the local computational efficiency while retaining the communication reduction.
The authors provide comprehensive theoretical analysis for DANE+, FedRed, and their variants, establishing improved communication and computational complexities compared to prior work. Experiments on both synthetic and real datasets demonstrate the effectiveness of the proposed methods.
Statistik
The smoothness parameter L for the client functions can be as small as the Lipschitz constant.
The Averaged Hessian Dissimilarity δA and Bounded Hessian Dissimilarity δB can be much smaller than L in practice.
Kutipan
"Federated learning is a distributed optimization paradigm that allows training machine learning models across decentralized devices while keeping the data localized."
"FedAvg suffers from client drift which can hamper performance and increase communication costs over centralized methods."
"DANE can achieve the desired communication reduction under Hessian similarity constraints."
"FedRed enjoys the same communication reduction as DANE+ but has improved local computational complexity."