Core Concepts
Addressing object hallucination in large vision-language models through a post-hoc rectification approach.
Abstract
Large vision-language models (LVLMs) have shown impressive abilities but suffer from object hallucination, impacting tasks like visual summarization. A new algorithm, LVLM Hallucination Revisor (LURE), is proposed to rectify object hallucinations by analyzing key factors like co-occurrence, uncertainty, and object position. LURE outperforms previous methods in reducing object hallucination metrics and human evaluations across multiple LVLMs. The method involves statistical analysis, dataset curation for training the revisor, and deployment for inference. Ablation studies confirm the importance of considering all three factors for effective reduction of object hallucination.
Stats
Published as a conference paper at ICLR 2024
Six open-source LVLMs evaluated
LURE outperforms previous methods in reducing object hallucination metrics
Quotes
"LURE is grounded in a rigorous statistical analysis of the key factors underlying object hallucination."
"LURE can significantly reduce object hallucination under general evaluation metrics and human evaluations."