Core Concepts
Addressing hallucination in large multi-modal models through robust instruction tuning.
Abstract
This content discusses the challenges of hallucination in large multi-modal models and introduces a new dataset, LRV-Instruction, to mitigate this issue. The study explores the impact of negative instructions on model performance and proposes GAVIE for evaluation. Experiments show improved model performance with balanced training data ratios.
Directory:
- Introduction
- Progress in natural language processing and multi-modal models.
- Data Extraction
- Introduction of LRV-Instruction dataset.
- Negative instructions designed for robust visual instruction tuning.
- Evaluation Methodology
- GPT4-Assisted Visual Instruction Evaluation (GAVIE).
- Experiment Results
- Improved model performance with LRV-Instruction.
- Detailed Analysis
- Performance at different semantic levels of hallucination.
- Conclusion and Future Directions
Stats
Our dataset comprises 400k visual instructions generated by GPT4.
Existing LMMs exhibit significant hallucinations with negative instructions.
Finetuning on LRV-Instruction improves model performance compared to state-of-the-art methods.
Quotes
"Despite promising progress in multi-modal tasks, current large multi-modal models are prone to hallucinating inconsistent descriptions."
"Our investigation reveals that most LMMs are finetuned on unbalanced datasets containing only positive instructions."
"We hope our work can help address the unexpected hallucination issues of LMMs."