The article introduces a method to identify key examples that contribute the most to contrastive self-supervised learning (SSL). By focusing on examples with high expected similarity between augmented views, the method allows for efficient data reduction without affecting downstream task performance. The approach addresses the challenge of quantifying the value of examples for SSL and provides rigorous guarantees for generalization performance. Through experiments on various datasets, it is shown that subsets selected by this method outperform random subsets by over 3%. Additionally, the study reveals that examples contributing most to contrastive learning are those contributing least to supervised learning.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問