Leveraging Large Language Models and Fine-Grained Datasets to Improve Zero-Shot Classification in Vision-Language Models
Leveraging complementary sources of information - descriptions of categories generated by large language models (LLMs) and abundant, fine-grained image classification datasets - to improve the zero-shot classification performance of vision-language models (VLMs) across fine-grained domains.