Core Concepts
The authors propose an Efficient Markov Chain Monte Carlo (EMC2) negative sampling method for contrastive learning that exhibits global convergence to a stationary point, regardless of the choice of batch size.
Abstract
The paper presents the EMC2 algorithm for optimizing the global contrastive loss in contrastive learning. The key highlights are:
EMC2 utilizes an adaptive Metropolis-Hastings (M-H) subroutine to generate hardness-aware negative samples in an online fashion during the optimization. This avoids the need to compute the partition function in the softmax distribution, which is computationally expensive.
The authors prove that EMC2 finds an O(1/√T)-stationary point of the global contrastive loss in T iterations. This global convergence guarantee holds regardless of the choice of batch size, in contrast to prior works.
Numerical experiments on pre-training image encoders on STL-10 and Imagenet-100 datasets show that EMC2 is effective with small batch training and achieves comparable or better performance than baseline algorithms.
The analysis involves a non-trivial adaptation of the generic result for biased stochastic approximation schemes. The authors show that the state-dependent Markov transition kernel induced by EMC2 is ergodic and Lipschitz continuous with respect to the model parameter θ.
Stats
The authors report the following key metrics:
Linear probe (LP) test accuracy on STL-10 and Imagenet-100 datasets
1-nearest-neighbor (1-NN) test accuracy on STL-10 and Imagenet-100 datasets