核心概念
This work conducts a thorough comparative evaluation of self-supervised visual learning methods in the low-data regime, identifying what is learnt via low-data SSL pretraining and how different SSL categories behave in such training scenarios.
摘要
The paper introduces a taxonomy of modern visual self-supervised learning (SSL) methods and provides detailed explanations and insights regarding the main categories of approaches. It then presents a comprehensive comparative experimental evaluation in the low-data regime, targeting to identify:
- What is learnt via low-data SSL pretraining?
- How do different SSL categories of methods behave in such training scenarios?
The authors note that the literature has not explored the behavior of SSL methods when the assumption of abundance of relevant data is not present, as the ability to pretrain on at least ImageNet-scale datasets is almost always assumed. However, a study in the low-data regime is important for practitioners working with specific image domains where it is difficult to obtain massive amounts of even unlabeled data.
The key findings from the experimental evaluation include:
- For domain-specific downstream tasks, in-domain low-data SSL pretraining outperforms the common approach of large-scale pretraining on general datasets.
- The performance of each category of SSL methods provides valuable insights and suggests future research directions in the field.
統計資料
Self-supervised learning leverages massive amounts of unlabeled data to learn useful image representations without relying on ground-truth labels.
Typical SSL pretraining datasets are in the order of millions of images, while this work focuses on the low-data regime of 50k-300k images.
The authors note that it is not always feasible to assemble and/or utilize very large pretraining datasets in real-world scenarios, motivating the investigation of SSL effectiveness in the low-data regime.
引述
"Although the SSL methodology has proven beneficial in the case of abundance of relevant unlabelled data, it is not always feasible or practical to assemble and/or to utilize very large pretraining datasets in real-world scenarios."
"Yet, a study in the low-data regime would be important, but currently missing, for practitioners who necessarily work with specific image domains (e.g., X-rays), where it is difficult to obtain massive amounts of even unlabeled data."