The paper introduces the Large Motion Model (LMM), a motion-centric, multi-modal framework that unifies mainstream motion generation tasks into a generalist model. To address the challenges of heterogeneous motion data and tasks, the authors make the following key contributions:
MotionVerse: A mega-scale, multi-modal, multi-task motion generation dataset that features a unified motion representation across a wide range of tasks and motion formats.
LMM Architecture: The authors design an articulated attention mechanism ArtAttention that incorporates body part-aware modeling into a Diffusion Transformer backbone, allowing for precise and robust control.
Pre-Training Strategy: The authors propose a novel pre-training strategy for LMM, including random frame rates and various masking techniques, to fully leverage extensive motion datasets and enhance the model's capabilities.
Extensive experiments demonstrate that the generalist LMM achieves competitive performance across various standard motion generation tasks over state-of-the-art specialist models. LMM also exhibits strong generalization capabilities and emerging properties across many unseen tasks. Ablation studies provide valuable insights about training and scaling up large motion models for future research.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Mingyuan Zha... at arxiv.org 04-02-2024
https://arxiv.org/pdf/2404.01284.pdfDeeper Inquiries