Core Concepts
This paper introduces a novel approach to valid inference on principal component analysis under a spiked covariance model with missing data and heteroskedastic noise, providing distributional guarantees for the estimators used.
Abstract
This paper explores statistical inference methods for principal component analysis (PCA) in high dimensions, focusing on missing data and heteroskedastic noise. The proposed approach, HeteroPCA, offers non-asymptotic distributional guarantees for PCA estimators, enabling the computation of confidence regions and entrywise confidence intervals. The study enhances prior works by accommodating missing data and heteroskedastic noise, providing fully data-driven inference procedures.
The content delves into problem formulation, background on the estimation algorithm HeteroPCA, distributional theory, numerical experiments, related works, subspace estimation detour, discussion on factor models in econometrics and financial modeling. The paper concludes with extensions and additional discussions.
Stats
p < 1 - δ for some arbitrary constant 0 < δ < 1 or p = 1
κ ≍ 1; µ ≍ 1; r ≍ 1; κω ≍ 1
Quotes
"The challenge is further compounded by the prevalent presence of missing data and heteroskedastic noise."
"We propose a novel approach to performing valid inference on the principal subspace under a spiked covariance model with missing data."
"Our inference procedures are fully data-driven and adaptive to heteroskedastic random noise."