Enhancing Vision-Language Models through Multi-Knowledge Representation and Prompt Learning
Incorporating diverse semantic knowledge representations, including visual, non-visual, and panoramic knowledge, can significantly enhance the performance of vision-language models in downstream tasks.