Enhancing Zero-Shot Generalization of Vision-Language Models through Robust Test-Time Augmentation
A robust MeanShift-based test-time augmentation method (MTA) that enhances the zero-shot generalization of vision-language models without requiring prompt learning or other intensive training procedures.