LIME-M: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models
LIME-M is a comprehensive benchmark that effectively evaluates the performance of Multimodal Large Language Models (MLLMs) by filtering out low-quality and easy samples, and focusing on challenging tasks that require deeper image understanding and reasoning.