Core Concepts
Balancing methods used to address imbalanced datasets can inflate the predictive multiplicity in the Rashomon set, leading to increased risks in model selection.
Abstract
The study investigates the impact of common balancing methods (random oversampling, SMOTE, random undersampling, and near miss) on the Rashomon effect in imbalanced classification tasks. The key findings are:
Balancing methods increase the ambiguity and discrepancy of the Rashomon set, indicating higher predictive multiplicity compared to the original imbalanced dataset.
The variable importance order discrepancy, a measure of model behavior change, does not show statistically significant differences between the Rashomon sets of original and balanced datasets.
Partial resampling, proposed as a solution to mitigate the bias of balancing methods, does not effectively address the increased predictive multiplicity.
The extended performance-gain plot, which monitors the trade-off between performance gain and Rashomon metrics, is proposed as a tool to responsibly conduct the model selection process when using balancing methods.
The results highlight the importance of considering the Rashomon effect and predictive multiplicity when applying balancing methods in imbalanced classification problems. Blindly selecting a model from the Rashomon set can lead to serious consequences, as the models may yield conflicting predictions for the same samples. The proposed performance-gain plot for Rashomon metrics can help researchers and practitioners make informed decisions during the model selection process.
Stats
The imbalanced ratio (majority class samples / minority class samples) of the datasets varies between 1.54 and 129.53.
The Rashomon parameter ε is set to 0.05.
The resampling ratios (imbalanced ratio after balancing) considered are {1, 1.05, 1.10, 1.15, 1.20, 1.25}.
Quotes
"Balancing methods inflate the predictive multiplicity, and they yield varying results."
"The extended performance-gain plot for the Rashomon effect can be a solution to monitor the trade-off between performance gain and multiplicity."