Core Concepts
A diffusion-based generative model is introduced for music synthesis and source separation, enabling total generation, partial generation, and source separation tasks simultaneously.
Abstract
The paper introduces a novel approach to music generation and source separation using a diffusion-based generative model. The model can handle tasks such as generating mixtures, imputing sources, and separating individual sources within a mixture. By training a single model on Slakh2100 dataset, the authors demonstrate competitive results in both qualitative and quantitative evaluations. The method bridges the gap between source separation and music generation by learning the joint distribution of contextual sources.
Stats
Our method achieves an FAD of 6.55 for total generation.
The sub-FAD metric for partial generation ranges from 0.11 to 6.1.
Source separation results show SI-SDRI values ranging from 12.53 to 20.90.
Quotes
"Our method is the first example of a single model that can handle both generation and separation tasks."
"Models designed for the generation task directly learn the distribution p(y) over mixtures, collapsing the information needed for the separation task."
"Our contribution bridges the gap between source separation and music generation by learning p(x1, . . . , xN), the joint (prior) distribution of contextual sources."