Core Concepts
Existing data pruning algorithms can produce highly biased classifiers, sacrificing performance on difficult classes to retain strong average accuracy. A fairness-aware pruning approach with random subsampling according to class-wise error rates can significantly improve the worst-class accuracy while maintaining high average performance.
Abstract
The content discusses the issue of classification bias in deep learning models, which can be exacerbated by existing data pruning techniques. It presents a comprehensive evaluation of various pruning algorithms through the lens of fairness, revealing that current methods often fail to improve, and in some cases even worsen, the performance disparity across classes.
The paper proposes a "fairness-aware" pruning approach called MetriQ, which selects class-wise pruning ratios based on the corresponding class-wise error rates computed on a hold-out validation set. When combined with random subsampling within classes, MetriQ is shown to consistently outperform other pruning algorithms in terms of both average and worst-class accuracy across standard computer vision benchmarks.
The authors provide theoretical analysis in a toy Gaussian mixture model setting, which sheds light on the fundamental principles behind the success of MetriQ. The analysis suggests that random pruning with appropriate class ratios has the potential to improve the worst-class performance, in contrast to existing pruning methods that often sacrifice difficult classes to retain strong average accuracy.
Stats
The content does not provide any specific numerical data or statistics. It focuses on the conceptual and empirical evaluation of data pruning algorithms with respect to fairness.