Core Concepts
FastLloyd is an efficient and privacy-preserving protocol for federated 𝑘-means clustering that combines secure computation and differential privacy. It improves the utility over state-of-the-art approaches while significantly reducing the runtime overhead.
Abstract
The paper presents FastLloyd, a protocol for privacy-preserving 𝑘-means clustering in the federated setting. Existing federated approaches using secure computation suffer from substantial overheads and do not offer output privacy. Differentially private (DP) 𝑘-means algorithms, on the other hand, assume a trusted central curator and do not extend to federated settings.
FastLloyd provides enhancements to both the DP and secure computation components, resulting in a design that is faster, more private, and more accurate than previous work:
Utility Improvement: By utilizing constrained clustering techniques, FastLloyd improves the utility of DP 𝑘-means over the state-of-the-art. It introduces a centroid-level aggregation method that tightens the sensitivity bound and reduces the amount of noise required.
Efficiency: FastLloyd uses a lightweight, secure aggregation-based approach that achieves four orders of magnitude speed-up over the state-of-the-art secure federated 𝑘-means approaches. This is achieved by leaking the differentially-private centroids to clients, allowing them to perform the computationally expensive assignment and update steps locally.
Privacy: FastLloyd provides end-to-end privacy guarantees in the computational differential privacy (CDP) model, protecting both the input and output of the 𝑘-means algorithm.
The paper provides a detailed error analysis, proving that the distortion in the centroids is proportional to the number of dimensions, iterations, and inversely proportional to the privacy budget and minimum cluster size. Extensive experiments on real-world datasets demonstrate that FastLloyd outperforms the state-of-the-art in both utility and efficiency.
Stats
The distortion in the centroids is proportional to 𝑑3𝑇2 and inversely proportional to 𝜖2𝜂2
𝑚𝑖𝑛.
The mean-squared error across all clusters is 2𝑘𝑑3𝑇2𝐵2/(𝜖2𝜂2
𝑚𝑖𝑛).
Quotes
"FastLloyd is an efficient and privacy-preserving protocol for federated 𝑘-means clustering that combines secure computation and differential privacy."
"By utilizing constrained clustering techniques, FastLloyd improves the utility of DP 𝑘-means over the state-of-the-art."
"FastLloyd uses a lightweight, secure aggregation-based approach that achieves four orders of magnitude speed-up over the state-of-the-art secure federated 𝑘-means approaches."