Harnessing Large Language Models for Seamless Multi-modal Music Generation
Mozart's Touch, a lightweight framework that leverages pre-trained Large Language Models and multi-modal models to generate music aligned with visual inputs, outperforming current state-of-the-art approaches.