Addressing Decision Shortcuts in Vision-Language Models
The author addresses the issue of decision shortcuts in vision-language models by proposing a test-time prompt tuning paradigm to focus on genuine causal invariant features and disregard decision shortcuts during inference.