MM-Interleaved: An End-to-End Generative Model for Interleaved Image-Text Data
MM-Interleaved is an end-to-end generative model that can efficiently process and generate interleaved image-text data by leveraging a multi-modal feature synchronizer (MMFS) to dynamically extract fine-grained visual details from multi-scale and multi-image feature maps.