Centrala begrepp
Automated dance synthesis integrating music and lyrics enhances semantic meaning and artistic expression.
Sammanfattning
The study introduces LM2D, a novel probabilistic architecture for dance synthesis conditioned on both music and lyrics. It addresses the limitations of existing models by incorporating a multimodal diffusion model with consistency distillation. The research includes the first 3D dance-motion dataset encompassing music and lyrics. Objective metrics and human evaluations demonstrate LM2D's ability to produce realistic dances matching both lyrics and music. The study explores the impact of lyrics in choreography, emphasizing the need for efficient single-step generation methods.
Statistik
A new dataset features 4.6 hours of 3D dance motion in 1867 sequences.
Librosa extracts 35-dimensional music features combining MFCC, chroma, peaks, and beats.
Lyrics embedded into BERT result in 768-dimensional features.
FID scores compared among different models: EDGE, LM2D, EDGE(cd), LM2D(cd).
Diversity metrics evaluated based on geometric and kinetic features.
Citat
"The integration of lyrics enriches the foundational tone of dance."
"Existing technologies focus on music-dance interaction but neglect lyrics' significant role."
"Our multimodal diffusion model with consistency distillation creates dance in a single step."