Evaluating the Counterfactual Reasoning Abilities of Multi-modal Language Models
Current multi-modal language models struggle with counterfactual reasoning, exhibiting significant performance drops on questions that require imagining alternative scenarios.