Leveraging Scene Graphs to Enhance Compositional Visual Reasoning in Large Multimodal Models
Compositional Chain-of-Thought (CCoT) is a novel zero-shot prompting method that utilizes scene graph representations to extract more compositional knowledge from Large Multimodal Models (LMMs) without the need for fine-tuning or annotated scene graph data.