Core Concepts
CoMo introduces a Controllable Motion generation model that accurately generates and edits motions by leveraging large language models, achieving competitive performance in motion generation and surpassing previous work in motion editing abilities.
Abstract
The content introduces CoMo, a model for controllable motion generation through language-guided pose code editing. It addresses the limitations of existing approaches by allowing fine-grained control over the generation process. CoMo decomposes motions into pose codes, enabling accurate motion editing based on textual inputs. The model consists of three main components: Motion Encoder-Decoder, Motion Generator, and Motion Editor. CoMo achieves competitive performance in text-driven motion generation compared to state-of-the-art models and excels in human studies for motion editing abilities.
Introduction
Text-to-motion models lack fine-grained controllability.
CoMo introduces a Controllable Motion generation model.
Challenges in human motion synthesis due to diverse behaviors.
Methodology
CoMo decomposes motions into discrete pose codes.
Components include Motion Encoder-Decoder, Generator, and Editor.
Utilizes large language models for precise motion editing.
Experiments & Results
Competitive performance on HumanML3D and KIT datasets.
Human evaluation shows preference for CoMo in motion editing.
Contributions include semantic motion representation and transformer-based model.
Stats
Experiments demonstrate that CoMo achieves competitive performance in text-driven motion generation compared to state-of-the-art models while substantially surpassing previous work in human studies for motion editing abilities.
Quotes
"CoMo allows for intuitive, language-controlled adjustments to the motion sequences."
"Experiments demonstrate that CoMo achieves competitive performance in motion generation."