M2UGen introduces a framework for multi-modal music understanding and generation using large language models.