HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
Core Concepts
Investigating and mitigating hallucinations in machine-generated visual instruction data using the HalluciDoctor framework.
Abstract
Multi-modal Large Language Models (MLLMs) have shown impressive performance but suffer from hallucinations in machine-generated data.
HalluciDoctor aims to detect and eliminate various types of hallucinations in visual instruction datasets.
The framework utilizes a cross-checking paradigm and counterfactual instruction expansion to enhance MLLMs' resistance to hallucinations.
Experimental results show successful mitigation of hallucinations and improved model performance.
Contributions include comprehensive investigation of hallucination toxicity, novel detection method, and enhanced dataset quality.
"We propose a novel HalluciDoctor method to detect various hallucinations by a consistency cross-checking paradigm and dispel them in a low-resource way."
"Based on that, we execute counterfactual visual instruction expansion to balance data distribution, thereby enhancing MLLMs’ resistance to hallucinations."
"Our empirical study confirms our method’s effectiveness in eliminating hallucinations in visual instruction data and improving MLLMs’ robustness."