Evaluating the Trustworthiness of Vision Language Models in Detecting Unsolvable Problems
Vision language models often struggle to identify when a given visual question is unsolvable, leading to unreliable responses. This paper introduces the Unsolvable Problem Detection (UPD) challenge to assess a model's ability to withhold answers when faced with incompatible or irrelevant image-question pairs.