Core Concepts
Contrastive models' relation to NTK and PCA analyzed.
Abstract
The content explores the convergence of contrastive models with Neural Tangent Kernels (NTK) and Principal Component Analysis (PCA). It delves into the training dynamics, orthogonality constraints, and the connection between contrastive losses and PCA. Theoretical analysis, empirical validation on MNIST dataset, and open problems are discussed.
- Introduction to self-supervised learning paradigm.
- Theoretical analysis of SSL in early stages.
- Generalization error bounds for downstream tasks.
- Spectral properties of data augmentation.
- Training dynamics of contrastive learning in linear neural networks.
- Wide networks' convergence results for contrastive losses.
- Orthogonality constraints impact on output layer representations.
- Connection between wide networks and PCA through trace maximization problem.
- Empirical validation on MNIST dataset for theoretical results.
Stats
"NTK after O(M1/6) steps is close to NTK at initialization."
"Frobenius norm deviation ∥C(t)−C(0)∥F = O(t/√M)."
"For some cosine-similarity based contrastive losses, the change in weights are bounded as |∆V ij(t)| ≤ β1/√M ∥W(t)∥max."
"For dot product similarity based losses, there exists cases where the change in weights become arbitrarily large even for arbitrarily wide neural networks."
"Under cosine similarity based losses, CV(0) is close to CV(t) in Frobenius norm."
Quotes
"NTK after O(M1/6) steps is close to NTK at initialization."
"Empirical validation of our theoretical results possibly hold beyond two-layer networks."
"Our deviation bounds suggest that representations learned by contrastive models are close to the principal components of a certain matrix computed from random features."