Core Concepts
The authors focus on identifying and mitigating number hallucinations in large vision-language models, proposing a consistency training method to address the issue effectively.
Abstract
The study introduces the concept of number hallucination in large vision-language models, highlighting the prevalence of this issue. It explores inner and outer inconsistency problems and proposes a consistency training method to mitigate number hallucinations.
Large vision-language models (LVLMs) have shown remarkable efficacy but struggle with various challenges, particularly hallucinations. The study focuses on number hallucination, where models fail to accurately identify object quantities in images. Evaluation metrics reveal severe prevalence of number hallucinations across LVLMs.
The authors propose a new form of object hallucination called number hallucination and introduce a dataset for evaluation. They analyze inconsistencies within tasks and between tasks, emphasizing the potential role of inconsistency in contributing to number hallucination.
A consistency training method is proposed to alleviate number hallucination by improving model performance through enhanced consistency. Results show an average improvement of 8% compared to direct finetuning methods.
Stats
All LVLMs investigated have an average MAE of around 2 on the dataset.
Consistency(I+II) method outperforms Direct by 8% averagely (macro-F1).