Multimodal ArXiv introduces ArXivCap and ArXivQA to improve LVLMs' understanding of scientific figures, enhancing mathematical reasoning capabilities.
The author introduces Multimodal ArXiv, consisting of ArXivCap and ArXivQA, to improve LVLMs' scientific comprehension by providing diverse figure-caption datasets. Fine-tuning on these datasets significantly enhances LVLMs' mathematical reasoning capabilities.