This survey provides a detailed analysis of hallucinations in Large Vision-Language Models (LVLMs). It starts by clarifying the concept of hallucinations in LVLMs, presenting various hallucination symptoms and highlighting the unique challenges inherent in LVLM hallucinations.
The authors then outline the benchmarks and methodologies tailored specifically for evaluating hallucinations unique to LVLMs. These include both discriminative and generative evaluation approaches, focusing on assessing the model's ability to generate non-hallucinatory content and discriminate hallucinations, respectively.
Furthermore, the survey delves into an investigation of the root causes of LVLM hallucinations, encompassing insights from the training data, vision encoders, modality alignment modules, and language models. The authors critically review existing methods for mitigating hallucinations, which target these various causes.
Finally, the survey discusses the open questions and future directions pertaining to hallucinations within LVLMs, highlighting areas such as supervision objectives, enriching modalities, LVLMs as agents, and improving interpretability.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問