Core Concepts
This work presents a sample- and time-efficient differentially private algorithm for ordinary least squares regression, with error that depends linearly on the dimension and is independent of the condition number of the design matrix.
Abstract
The authors present a new algorithm, ISSP, for differentially private linear regression. ISSP works in two main phases:
It searches for a reweighting of the dataset such that running OLS on this reweighted version is roughly stable.
It computes the OLS solution on this weighted version of the data and adds appropriately shaped Gaussian noise to the solution.
The key technical advances are:
ISSP does not require any norm bounds on the data, only that the dataset is "good" - i.e., has bounded leverage scores and bounded residuals. This captures natural, well-studied settings where OLS is a sensible procedure.
The authors prove that ISSP is differentially private and establish utility guarantees. On good datasets, the private estimator is just a slightly noisier version of the empirical OLS solution.
For random-design regression with subgaussian covariates and label noise, the error of ISSP is shown to be nearly optimal, matching known lower bounds up to logarithmic factors.
The algorithm can be implemented efficiently, requiring only basic linear algebraic operations.
Stats
The maximum leverage score of any observation is bounded by L.
The magnitude of the residual for any observation is bounded by R.
Quotes
"We present a sample- and time-efficient differentially private algorithm for ordinary least squares, with error that depends linearly on the dimension and is independent of the condition number of X⊤X, where X is the design matrix."
"All prior private algorithms for this task require either d^(3/2) examples, error growing polynomially with the condition number, or exponential time."