Efficient Zero-Shot Distillation of CLIP Image Encoders Using Synthetic Data
Small CLIP image encoder students can be efficiently distilled from a larger teacher model using synthetic data, achieving on-par zero-shot performance while featuring up to 92% fewer parameters.