Core Concepts
Introducing a large-scale MultiModal Chart Instruction (MMC-Instruction) dataset and a comprehensive MultiModal Chart Benchmark (MMC-Benchmark) to advance the multimodal understanding of charts by large language models.
Abstract
The paper introduces two key contributions to advance multimodal chart understanding:
MMC-Instruction Dataset:
A large-scale dataset of 600k instances for chart understanding, including chart-text alignment data and chart instruction-tuning data.
The dataset covers diverse topics, language styles, chart types, and open-ended answers, aiming to enable large language models (LLMs) to better comprehend and reason about chart contents.
MMC-Benchmark:
A comprehensive human-annotated benchmark for evaluating LLMs' chart understanding capabilities across nine distinct tasks, including chart information extraction, chart reasoning, contextual chart understanding, multiple chart understanding, chart type/topic classification, chart-to-datatable, and chart-to-json.
The benchmark provides two evaluation protocols: free-format Generation Ability Evaluation using GPT-4 and multiple-choice QA format Chart Understanding Ability Evaluation.
The authors also propose a multimodal chart understanding model called MMCA, which is instruction-tuned on the MMC-Instruction dataset and achieves state-of-the-art performance on existing chart understanding benchmarks. Extensive experiments on MMC-Benchmark reveal the limitations of existing LLMs, including the recent GPT-4V, in correctly interpreting charts, highlighting the importance of the MMC-Instruction dataset and MMC-Benchmark in advancing multimodal chart understanding.
Stats
92% of Americans favor expanding solar power or wind power.
China, Hong Kong SAR is the leading importing country of gold, silverware, and jewelry with the highest import value in 2018.
Russia, Canada, and the USA are the top 3 largest countries by land area.
Quotes
"Current open-source LMMs are limited in their ability to accurately interpret complex chart contents, as they often lack domain-specific training essential for tasks such as differentiating between various types of graphs, interpreting axis labels and data points, and extracting meaningful patterns and trends."
"To accurately assess the capabilities of current Large Multimodal Models (LMMs) for chart understanding, we introduce a novel comprehensive evaluation tool: the MultiModal Chart Benchmark (MMC-Benchmark)."
"Our experiments indicate that MMC-Benchmark also poses significant challenges to GPT-4V, especially in Chart to Datatable and Chart to Json tasks."