Evaluating the Conceptual Understanding of Large Visual-Language Models
Large visual-language models often excel at downstream tasks, but it is unclear if their performance is due to genuine conceptual understanding or simply memorization. This work proposes novel benchmarks to probe three key aspects of conceptual understanding in these models: relations, composition, and context.