Core Concepts
This study provides a systematic and realistic comparison of semi-supervised and self-supervised learning methods for medical image classification tasks, demonstrating that hyperparameter tuning is effective even with limited validation data, and that the semi-supervised method MixMatch delivers the most reliable performance gains across multiple datasets.
Abstract
This study presents a comprehensive evaluation of semi-supervised and self-supervised learning (SSL) methods for medical image classification tasks. The authors address two key questions: 1) Can hyperparameter tuning be effective with realistic-sized validation sets? 2) When all methods are tuned well, which self- or semi-supervised methods achieve the best accuracy?
The authors use four open-access medical image datasets (TissueMNIST, PathMNIST, TMED-2, and AIROGS) with varying image resolutions and class imbalance. They compare 6 semi-supervised and 7 self-supervised methods, as well as 3 supervised baselines, under a unified experimental protocol that respects realistic constraints on labeled data and compute resources.
The key findings are:
Hyperparameter tuning is effective even with realistic-sized validation sets (no larger than the training set), contrary to previous concerns.
The semi-supervised method MixMatch delivers the most reliable performance gains across the four datasets, outperforming both self-supervised and other semi-supervised approaches.
Transferring hyperparameters from other datasets is less reliable than tuning on the target dataset, highlighting the importance of the proposed tuning procedure.
The authors provide best practices for resource-constrained practitioners, emphasizing the effectiveness of hyperparameter tuning and the strong performance of MixMatch.
Overall, this study offers valuable insights to guide the practical deployment of SSL methods for medical image classification tasks with limited labeled data.
Stats
The labeled training set sizes range from 400 to 1660 images across the four datasets.
The unlabeled training set sizes range from 89,546 to 353,500 images.
Quotes
"Hyperparameter tuning is effective, and the semi-supervised method known as MixMatch delivers the most reliable gains across 4 datasets."
"Transferring hyperparameters from other datasets is less reliable than tuning on the target dataset, highlighting the importance of the proposed tuning procedure."