本文提出了一種名為 SMMQG 的合成數據生成框架，用於生成基於多模態文檔、符合特定風格和模態要求的多模態問答對，並通過實驗證明了其生成數據的高質量和在評估多模態問答系統方面的有效性。


coremsg

基於合成數據的多模態問答生成框架-風格與模態的精準控制


基於合成數據的多模態問答生成框架：風格與模態的精準控制


title_rewrite


Current state-of-the-art large foundation models exhibit varying strengths and weaknesses in multimodal reasoning capabilities, with no single model outperforming others across all tasks. Detailed evaluation reveals opportunities for improvement in areas like geometric reasoning, benefiting from multimodal input, and grounding information retrieval.


evaluating-and-benchmarking-the-multimodal-reasoning-capabilities-of-large-foundation-models


Evaluating and Benchmarking the Multimodal Reasoning Capabilities of Large Foundation Models



Introducing the Chain-of-Action framework to enhance question answering by addressing unfaithful hallucination and weak reasoning in complex tasks.


enhancing-question-answering-with-chain-of-action-framework


Enhancing Question Answering with Chain-of-Action Framework