Core Concepts
The author proposes a shifted normal distribution sampling function to enhance cost efficiency in active learning, particularly in cases of imbalanced labeling costs for positive and negative instances.
Abstract
Active learning is a valuable strategy for optimizing machine learning by selecting data points intelligently. The paper introduces a new sampling function to address the challenges of noisy datasets and imbalanced labeling costs. The proposed method shows significant improvements in cost efficiency compared to traditional approaches like uncertainty sampling and random sampling. By focusing on the uncertainty and noise regions, the algorithm aims to refine models effectively while minimizing costly labeling errors.
Stats
Our simulation underscores that our proposed sampling function limits both noisy and positive label selection, delivering between 20% and 32% improved cost efficiency over different test datasets.
For example, in the financial industry, such as in money-lending businesses, a defaulted loan constitutes a positive event leading to substantial financial loss.
We adopt C = 1 as our standard value for the cost associated with a single positive instance relative to a negative instance.
The proposed shifted normal sampling achieves η(normal) = 1.88 for cost efficiency at the final query.
Shifted normal sampling has similar AUC performance as random sampling but lower positive event ratio, resulting in better cost efficiency.
Quotes
"We propose a shifted normal distribution sampling function that samples from a wider range than typical uncertainty sampling."
"Our simulation underscores that our proposed sampling function limits both noisy and positive label selection."
"The proposed method shows significant improvements in cost efficiency compared to traditional approaches like uncertainty sampling and random sampling."