The paper addresses the challenge of generating 3D human motion from text, emphasizing the importance of considering interactions by physical contacts. It introduces a novel dataset named RICH-CAT and proposes an approach named CATMO for text-driven interactive human motion synthesis. The method integrates VQ-VAE models for encoding motion and body contact sequences, an intertwined GPT for generating motions and contacts, and a pre-trained text encoder for learning textual embeddings. Experimental results demonstrate the superior performance of the proposed approach compared to existing methods in producing stable, contact-aware motion sequences.
إلى لغة أخرى
من محتوى المصدر
arxiv.org
الرؤى الأساسية المستخلصة من
by Sihan Ma,Qio... في arxiv.org 03-26-2024
https://arxiv.org/pdf/2403.15709.pdfاستفسارات أعمق