Multimodal World Knowledge in Videos Enables Long-Chain Reasoning for Complex Question Answering
WorldQA, a video understanding dataset, challenges models to leverage multimodal information and broad world knowledge to answer complex questions through long reasoning chains.