Advancing Geometric Problem Solving: A Comprehensive Benchmark for Evaluating Multimodal Model Performance
The introduction of the MM-MATH dataset, a novel benchmark designed to rigorously evaluate the performance of advanced large language and multimodal models in the domain of geometric computation, uncovering critical gaps in their textual and visual comprehension abilities.