Recent advancements in generative diffusion models have revolutionized text-controlled image synthesis. InstructCV aims to bridge the gap between text-to-image generative models and standard visual recognition tasks by developing a unified language interface. By casting various computer vision tasks as text-to-image generation problems, InstructCV utilizes natural language instructions to guide the model's functionality. The model is trained on a multi-modal and multi-task dataset, enabling it to perform competitively compared to other vision models. In experiments, InstructCV showcases compelling generalization capabilities to unseen data, categories, and user instructions.
Başka Bir Dile
kaynak içeriğinden
arxiv.org
Önemli Bilgiler Şuradan Elde Edildi
by Yulu Gan,Sun... : arxiv.org 03-15-2024
https://arxiv.org/pdf/2310.00390.pdfDaha Derin Sorular