Comprehensive Multimodal Benchmark for Evaluating STEM Skills of Neural Models
The STEM dataset provides a comprehensive multimodal benchmark to evaluate the STEM (science, technology, engineering, and mathematics) problem-solving abilities of neural models, revealing significant gaps between current models and human performance.