CatLIP: Accelerating Image-Text Pre-training by Reframing as a Classification Task
CatLIP, a novel weakly-supervised pre-training method, reframes image-text pre-training as a classification task, achieving a 2.7x faster training speed compared to contrastive learning while maintaining CLIP-level accuracy on downstream tasks.