The article introduces a method to identify key examples that contribute the most to contrastive self-supervised learning (SSL). By focusing on examples with high expected similarity between augmented views, the method allows for efficient data reduction without affecting downstream task performance. The approach addresses the challenge of quantifying the value of examples for SSL and provides rigorous guarantees for generalization performance. Through experiments on various datasets, it is shown that subsets selected by this method outperform random subsets by over 3%. Additionally, the study reveals that examples contributing most to contrastive learning are those contributing least to supervised learning.
To Another Language
from source content
arxiv.org
Głębsze pytania