The content presents a sharp asymptotic analysis of the estimators obtained by randomly reweighted loss functions for learning from imbalanced data. The key findings are:
UB can improve the F-measure performance of classifiers even with large class imbalance, by increasing the size of the majority class while keeping the minority class size fixed.
The performance of US does not depend on the size of the excess majority class examples, as its behavior is determined only by the minority class size.
The performance of SW degrades as the size of the excess majority class examples increases, especially when the minority class size is small and the imbalance is large.
UB seems to be robust to the interpolation phase transition, unlike the standard interpolator obtained from a single realization of the training data.
The analysis is based on deriving a sharp characterization of the statistical behavior of the linear classifiers obtained by minimizing the reweighted empirical risk function, in the asymptotic limit where the input dimension and data size diverge proportionally. This is done using the replica method from statistical mechanics.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Takashi Taka... at arxiv.org 04-16-2024
https://arxiv.org/pdf/2404.09779.pdfDeeper Inquiries