Self-Supervised Visual Preference Alignment for Enhancing Vision-Language Model Capabilities
Unsupervised self-supervised visual preference alignment can significantly improve the comprehension abilities of vision-language models, including stronger chain-of-thought skills, better OCR ability, proper alignment with user intentions, and reduced hallucinations.