Preserving Out-of-Distribution Generalization in Vision-Language Model Finetuning
Anchor-based Robust Finetuning (ARF) regularizes the finetuning process of vision-language models like CLIP to preserve their out-of-distribution generalization capabilities in both domain shift and zero-shot learning scenarios.