Improving Out-of-Distribution Generalization of Vision-Language Models through Class-Conditional Feature Synthesis and Adaptive Self-Distillation
Existing vision-language models exhibit strong generalization on in-distribution data, but struggle to handle out-of-distribution visual concepts. This paper proposes a novel approach called OGEN that leverages class-conditional feature synthesis and adaptive self-distillation to improve the out-of-distribution generalization of finetuned vision-language models.