The paper addresses the challenge of generating 3D human motion from text, emphasizing the importance of considering interactions by physical contacts. It introduces a novel dataset named RICH-CAT and proposes an approach named CATMO for text-driven interactive human motion synthesis. The method integrates VQ-VAE models for encoding motion and body contact sequences, an intertwined GPT for generating motions and contacts, and a pre-trained text encoder for learning textual embeddings. Experimental results demonstrate the superior performance of the proposed approach compared to existing methods in producing stable, contact-aware motion sequences.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Sihan Ma,Qio... at arxiv.org 03-26-2024
https://arxiv.org/pdf/2403.15709.pdfDeeper Inquiries