Scaling Down CLIP: Exploring Data, Architecture, and Training Strategies for Efficient Performance
This paper investigates the performance of the Contrastive Language-Image Pre-training (CLIP) model when scaled down to limited computation budgets, exploring the impact of data, architecture, and training strategies.