Evaluating Large Vision-Language Models: Identifying Limitations and Proposing a Robust Benchmark
Large vision-language models (LVLMs) have recently achieved rapid progress, but current evaluation methods have two primary issues: 1) Many evaluation samples do not require visual understanding, as the answers can be directly inferred from the questions and options or the world knowledge embedded in language models. 2) Unintentional data leakage exists in the training of LLMs and LVLMs, allowing them to answer some visual-necessary questions without accessing the images.