本文提出了一種名為 SMMQG 的合成數據生成框架,用於生成基於多模態文檔、符合特定風格和模態要求的多模態問答對,並通過實驗證明了其生成數據的高質量和在評估多模態問答系統方面的有效性。
Current state-of-the-art large foundation models exhibit varying strengths and weaknesses in multimodal reasoning capabilities, with no single model outperforming others across all tasks. Detailed evaluation reveals opportunities for improvement in areas like geometric reasoning, benefiting from multimodal input, and grounding information retrieval.
Introducing the Chain-of-Action framework to enhance question answering by addressing unfaithful hallucination and weak reasoning in complex tasks.