De, A., Li, H., Nadimpalli, S., & Servedio, R. A. (2024). Detecting Low-Degree Truncation. arXiv preprint arXiv:2402.08133v2.
This research paper investigates the problem of detecting whether a known high-dimensional distribution has been truncated by an unknown low-degree polynomial threshold function (PTF). The authors aim to design computationally efficient algorithms for this task and establish corresponding lower bounds on sample complexity.
The authors develop a novel algorithm, PTF-Distinguisher, which leverages the properties of hypercontractive product distributions and the Fourier analysis of Boolean functions. The algorithm employs a feature expansion based on the polynomial kernel and utilizes a U-statistic-based estimator to distinguish between truncated and untruncated distributions. The analysis relies on anti-concentration properties of low-degree polynomials and the level-k inequalities for Boolean functions. For the lower bound, the authors construct a distribution over degree-d PTFs and demonstrate the indistinguishability of truncated and untruncated distributions using properties of Gaussian random polynomials and bounds on the total variation distance between multivariate normal distributions.
The study demonstrates that efficient truncation detection is possible for a broad class of distributions and truncation sets defined by low-degree PTFs. The proposed algorithm and matching lower bound provide a comprehensive understanding of the sample complexity for this fundamental problem.
This work advances the understanding of truncated statistics in high dimensions and has implications for various fields, including machine learning, statistics, and theoretical computer science. The results contribute to the growing body of work on learning and testing with truncated data.
The study focuses on hypercontractive product distributions and low-degree PTFs. Exploring truncation detection for other classes of distributions and truncation sets remains an open problem. Further research could investigate the robustness of the proposed algorithm to noise and model misspecification.
翻译成其他语言
从原文生成
arxiv.org
更深入的查询