Cantor: Enhancing Multimodal Chain-of-Thought Reasoning with Multimodal Large Language Models
Cantor, a novel multimodal chain-of-thought framework, effectively integrates visual context and logical reasoning to solve complex visual reasoning tasks by leveraging the advanced cognitive capabilities of multimodal large language models.