This paper introduces the Unsolvable Problem Detection (UPD) challenge, which evaluates the ability of vision language models (VLMs) to recognize and refrain from answering unsolvable problems in the context of visual question answering (VQA) tasks.
The UPD challenge encompasses three distinct settings:
The authors create three benchmarks, MM-AAD Bench, MM-IASD Bench, and MM-IVQD Bench, based on the MMBench dataset, to systematically evaluate these UPD settings.
Experiments on five recent open-source VLMs and two close-source VLMs reveal that most models struggle to withhold answers even when faced with unsolvable problems, highlighting significant room for improvement. The authors explore both training-free (prompt engineering) and training-based (instruction tuning) approaches to address UPD, but find that notable challenges remain, particularly for smaller VLMs and in the AAD setting.
The paper emphasizes the importance of developing more trustworthy and reliable VLMs that can accurately identify and refrain from answering unsolvable problems, which is crucial for the safe and practical deployment of these models.
Na inny język
z treści źródłowej
arxiv.org
Głębsze pytania