Music generation models possess impressive generative capabilities, but the extent to which they encode fundamental Western music theory concepts within their internal representations remains unclear. This study introduces a synthetic dataset, SynTheory, to systematically probe the encoding of music theory concepts, including tempo, time signatures, notes, intervals, scales, chords, and chord progressions, in state-of-the-art music generation models such as Jukebox and MusicGen.
The proposed functional representation for symbolic music, which encodes melody notes and chords using Roman numerals relative to musical keys, enables effective modeling of musical keys and generates diverse harmonies to convey desired emotional valence.
A model for generating symbolic orchestral music that can transfer the style of a reference piece while maintaining melodic fidelity and allowing control over textural attributes.
Seed-Music is a versatile framework that leverages both auto-regressive language modeling and diffusion approaches to enable high-quality music generation with fine-grained style control, as well as interactive editing of generated music.
A multi-source latent diffusion model (MSLDM) that efficiently captures the unique characteristics of each instrumental source in a compact latent representation, enabling the generation of consistent and harmonious multi-instrumental music.
Introducing chords as a control condition for end-to-end song generation, enabling enhanced musicality and control precision through an innovative Attention mechanism with Dynamic Weights Sequence.
This work presents a unified approach to incorporating content-based controls, such as chord progressions and drum patterns, into large-scale music audio generative models, enabling flexible variation generation and arrangement.
Melodist, a novel two-stage model, can generate songs incorporating both vocals and accompaniments from text prompts, leveraging tri-tower contrastive pretraining to learn effective text representations for controllable synthesis.
A novel approach for inference-time control of generative music transformers, which self-monitors probe accuracy to impose desired musical traits while maintaining overall music quality.
Proposing Video2Music framework for generating music that matches video content using Affective Multimodal Transformer.