Aligning Large Vision-Language Models with Fine-Grained AI Feedback to Mitigate Hallucinations
To mitigate the hallucination problem in large vision-language models, the authors propose an innovative method called Fine-Grained Artificial Intelligence Feedback (FGAIF) that aligns the text and image modalities through fine-grained feedback, including object existence, object attribute, and object relationship hallucinations.