핵심 개념
A novel Two-Stream Sample Distillation (TSSD) framework is designed to train a robust network under the supervision of noisy labels by jointly considering the sample structure in feature space and the human prior in loss space.
초록
The paper proposes a Two-Stream Sample Distillation (TSSD) framework for robust noisy label learning. It consists of two main modules:
-
Parallel Sample Division (PSD) module:
- Divides the training samples into a certain set and an uncertain set by jointly considering the sample structure in feature space and the human prior in loss space.
- The certain set includes positive and negative samples that are accepted as clean and rejected as noisy with high confidence, respectively.
- The uncertain set includes semi-hard samples that cannot be confidently judged as clean or noisy.
-
Meta Sample Purification (MSP) module:
- Learns a meta classifier with extra golden data (positive and negative samples from the certain set) to further identify additional semi-hard samples from the uncertain set.
- Gradually mines more high-quality samples with clean labels to train the network robustly.
The authors conduct extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet, and Clothing-1M datasets, demonstrating state-of-the-art performance under different noise types and noise rates.
통계
The CIFAR-10 and CIFAR-100 datasets consist of 50,000 training images and 10,000 test images.
The Tiny-ImageNet dataset contains 200 training classes, with 500 images per class, and a test set of 10,000 images.
The Clothing-1M dataset has 1M clothing images in 14 classes, with 50k training, 14k validation, and 10k test images.
인용구
"Noisy label learning aims to learn robust networks under the supervision of noisy labels, which plays a critical role in deep learning."
"The critical issue of sample selection lies in how to judge the reliability of noisy labels in the training process."
"Our TSSD method has improved significantly compared to methods based solely on cross-entropy, JS-divergence, or Ls
n."