核心概念
A novel approach to conformal prediction for regression that converts the regression problem into a classification problem, allowing the use of flexible classification-based conformal prediction techniques to handle complex output distributions such as heteroscedasticity and bimodality.
要約
The paper presents a new method called Regression-to-Classification Conformal Prediction (R2CCP) that addresses the challenges of conformal prediction for regression, especially when the output distribution is heteroscedastic, multimodal, or skewed.
Key highlights:
- Regression problems are converted into classification problems by discretizing the output space into bins, treating each bin as a distinct class.
- A new loss function is designed to preserve the ordering of the continuous output space, penalizing the density on bins far from the true output value while using entropy regularization to facilitate variability.
- The resulting method can adapt to heteroscedasticity, bimodality, or both in the label distribution, as demonstrated on synthetic and real datasets.
- Empirical results show that R2CCP achieves the shortest prediction intervals compared to other conformal prediction baselines.
The paper first provides background on conformal prediction and its challenges for regression. It then introduces the R2CCP approach, detailing the classification-based framework and the custom loss function. Extensive experiments are conducted to showcase the method's ability to handle complex output distributions and its superior performance over existing conformal prediction techniques.
統計
The label distribution can exhibit heteroscedasticity, where the variance changes with the input. (Figure 1a)
The label distribution can be bimodal, with two distinct peaks. (Figure 1b)
引用
"Conformal Prediction (CP) (Vovk et al., 2005) has recently gained popularity and has been used successfully in applications such as breast cancer detection (Lambrou et al., 2009), stroke risk prediction (Lambrou et al., 2010), and drug discovery (Cortés-Ciriano & Bender, 2020)."
"Despite its popularity, CP for regression can be challenging, especially when the output distribution is heteroscedastic, multimodal, or skewed (Lei & Wasserman, 2014)."