Uncovering Latent Factors and Biases in Large Vision-Language Model Evaluations
Empirical analysis reveals that a small number of latent factors, including output length bias, text reading vs. reasoning, and spatial reasoning, underlie the performance of large vision-language models across diverse test tasks.