OMG-LLaVA: A Unified Model for Image-level, Object-level, and Pixel-level Understanding and Reasoning
OMG-LLaVA is a new and elegant framework that combines powerful pixel-level vision understanding with reasoning abilities, enabling it to accept various visual and text prompts for flexible user interaction.