MARIO Eval: A Comprehensive Mathematical Dataset Evaluation Toolkit
A comprehensive mathematical evaluation toolkit that utilizes a Python computer algebra system (CAS) and optionally integrates a large language model (LLM) to provide robust and consistent evaluation of mathematical reasoning capabilities across different datasets.