Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models with Lumen
The author introduces Lumen, a Large multimodal model with versatile vision-centric capabilities enhancement, by decoupling task-agnostic and task-specific learning processes to unleash the potential of LMMs.