Large Vision-Language Models struggle with fine-grained visual categorization due to a modality gap, hindering accurate attribute generation.


coremsg

fine-grained-visual-concept-recognition-in-large-vision-language-models


Fine-Grained Visual Concept Recognition in Large Vision Language Models


title_rewrite


The author explores the limitations of instruction-tuned Large Vision-Language Models in fine-grained visual categorization, highlighting a modality gap between textual and visual inputs. The proposed FINER benchmark aims to enhance LVLMs' ability in fine-grained image understanding.


investigating-fine-grained-visual-concept-recognition-in-large-vision-language-models


Investigating Fine-Grained Visual Concept Recognition in Large Vision Language Models