Core Concepts
Automated spatial composition of 3D human motions using GPT-guided synthetic data improves generation quality.
Abstract
This article introduces SINC, a model for generating 3D human motions for simultaneous actions. It leverages GPT-3 to extract knowledge about body parts involved in actions, creating synthetic data for training. The model outperforms baselines in spatial composition generation, addressing data scarcity challenges and improving realism.
Directory:
Abstract
Goal: Synthesize 3D human motions for simultaneous actions.
Method: Extract knowledge using GPT-3, create synthetic data, train SINC model.
Introduction
Interest in text-conditioned 3D human motion generation.
Applications in special effects, games, and virtual reality.
Spatial Composition of Motions from Textual Descriptions
Goal: Generate realistic 3D human motions for simultaneous actions.
Method: GPT-guided synthetic data creation, model training, implementation details.
Experiments
Data: BABEL dataset used for training and validation.
Metrics: Average Positional Error, Average Variance Error, TEMOS score.
Baselines: Single-action models, GPT-compositing, SINC model.
Qualitative Analysis
Results: SINC model successfully generates simultaneous action motions.
Comparison: SINC outperforms single-action models and models without synthetic data.
Limitations
Challenges: Synthetic data limitations, evaluation metrics, semantic compatibility.
Conclusions
Contribution: SINC model improves spatial composition of 3D human motions.
Future Work: Explore joint spatial and temporal action composition.
Stats
"Our code is publicly available at sinc.is.tue.mpg.de."
"BABEL contains only roughly 2.5K segments with simultaneous actions, while it has ∼25K segments with only one action."
"The latent vectors are sampled using the re-parametrization trick."
"We set the batch size to 64 and the learning rate to 3·10−4 for all our experiments."
"We train all of our models for 500 epochs."
Quotes
"Our code is publicly available at sinc.is.tue.mpg.de."
"BABEL contains only roughly 2.5K segments with simultaneous actions, while it has ∼25K segments with only one action."
"The latent vectors are sampled using the re-parametrization trick."
"We set the batch size to 64 and the learning rate to 3·10−4 for all our experiments."
"We train all of our models for 500 epochs."