Cross-prediction is introduced as a method for valid inference powered by machine learning. It imputes missing labels using a small labeled dataset and a large unlabeled dataset, resulting in more powerful inferences compared to using only the labeled data. The approach is shown to be consistently more powerful than prediction-powered inference, especially when predictions are useful. Additionally, cross-prediction provides stable conclusions with lower variability in confidence intervals compared to classical inference methods.
The content discusses the importance of reliable data-driven decision-making and the challenges associated with acquiring high-quality labeled data. Machine learning techniques are proposed as an alternative to produce large amounts of predicted labels quickly and cost-effectively. Cross-prediction is presented as a method for semi-supervised inference that leverages machine learning powerfully while ensuring validity. The article also explores related work on semi-supervised inference, prediction-powered inference, and other relevant topics.
Key metrics or figures mentioned include the number of folds used in cross-prediction (K = 10), the size of the unlabeled dataset (N = 10,000), and variations in the size of the labeled dataset (n = 100-1000). The experiments involve synthetic data to demonstrate the effectiveness of cross-prediction compared to classical inference methods and prediction-powered inference.
翻譯成其他語言
從原文內容
arxiv.org
深入探究