Core Concepts
This paper provides a comprehensive review of optimization techniques for high-dimensional differentially private linear models, including linear and logistic regression. The authors implement and empirically evaluate all the reviewed methods, providing insights on their strengths, weaknesses, and performance across various datasets.
Abstract
The paper begins by providing an overview of differential privacy (DP) and nonprivate optimization methods for high-dimensional linear models. It then reviews various optimization techniques that have been proposed for high-dimensional DP linear models, organized by the optimization approach:
Model Selection: The authors discuss methods that first privately select a subset of features and then use traditional DP optimization to find the weight vector. These methods make assumptions about the algorithmic stability of feature selection.
Frank-Wolfe: The authors review DP variants of the Frank-Wolfe algorithm, which iteratively chooses to move towards a vertex of a polytope constraint in a private manner. These methods assume the loss function is Lipschitz and smooth, and that solutions can be found in few iterations.
Compressed Learning: This approach reduces the dimensionality of the input space by multiplying the design matrix by a random matrix, and then optimizing in the lower-dimensional space. The methods assume the loss is Lipschitz and that the random matrix does not destroy important information in the dataset.
ADMM: The authors discuss privatizing the ADMM algorithm using objective perturbation. These methods assume a large hyperparameter search space is possible and that ADMM converges.
Thresholding: These methods use iterative gradient hard thresholding to produce a sparse weight vector, and then privatize the process with gradient perturbation or output perturbation. They assume the thresholding can efficiently identify important coefficients, and that truncated gradients provide effective signal for heavy-tailed data.
Coordinate Descent: These methods use greedy coordinate descent to privately update a single component of the weight vector at a time. They assume the greedy coordinate descent can be implemented efficiently and that Lipschitz constants for each feature are known.
Mirror Descent: The authors review a method that uses iteratively stronger regularization to solve a constrained optimization problem in a private manner. It assumes composing multiple private optimizations is numerically stable.
The paper then describes the implementation details and challenges faced when implementing these methods. Finally, it presents an extensive empirical evaluation of the methods on several linear regression and logistic regression datasets, providing insights on the performance trends observed.
Stats
The maximum L1-norm of any sample in the datasets is 1.