EXAMS-V: A Comprehensive Multilingual Exam Benchmark for Vision Language Models
المفاهيم الأساسية
EXAMS-V introduces a challenging multi-discipline exam benchmark for evaluating vision language models, emphasizing diverse languages and complex reasoning.
الملخص
Abstract:
Introduces EXAMS-V, a new challenging multi-discipline multimodal multilingual exam benchmark.
Consists of 20,932 multiple-choice questions across 20 school disciplines in 11 languages.
Uniquely curated by gathering school exam questions from various countries with diverse education systems.
Introduction:
Large Language Models (LLMs) advancements in understanding natural languages.
Notable developments like GPT-4V and Gemini represent a new era in image understanding.
Datasets:
Comparison with existing benchmarks like M3Exam and other school exam datasets.
EXAMS-V includes subjects like Physics, Biology, History, Chemistry, Geography, etc.
Related Work:
LLM advancements in generating human-like text and performing NLP tasks.
Focus on extending monolingual language models to multilingual and multimodal capabilities.
Data Extraction:
"EXAMS-V is uniquely curated by gathering school exam questions from various countries."
"The dataset contains 20,932 samples spanning 20 subjects from grade 4-12."