Chat-UniVi introduces a unified vision-language model that comprehends and engages in conversations involving images and videos through dynamic visual tokens, outperforming existing methods.
Chat-UniVi empowers large language models to comprehend and engage in conversations involving images and videos through a unified visual representation.