Tag2Text introduces image tagging to guide vision-language models, enhancing performance and controllability.
Tag2Text introduces image tagging into vision-language models to guide the learning of visual-linguistic features, demonstrating superior zero-shot performance and enhancing model efficiency.