Maximizing Inherent Representation Capabilities of Vision-Language Models through Training-Free Unsupervised Prompt Learning
The proposed Training-Free Unsupervised Prompt (TFUP) method maximally preserves the inherent representation capabilities of pre-trained vision-language models and enhances them with a residual connection to similarity-based prediction probabilities in a training-free and labeling-free manner.