The study investigates whether CLIP's accuracy on out-of-distribution (OOD) benchmarks is primarily due to highly similar images in its training set. By pruning LAION splits to replicate ImageNet's train-test similarity, the study finds that while some benchmarks show a performance drop, overall, CLIP maintains high performance. The research highlights that factors beyond high train-test similarity drive CLIP's ability to learn good representations. Additionally, a 100M subset of LAION is identified where CLIP can maintain its original performance. Large models like GPT-4 and LLaMa are transforming technology with their remarkable performance trained on vast datasets scraped from the internet.
To Another Language
from source content
arxiv.org
Deeper Inquiries