The proposed method, SIGGesture, can synthesize high-quality and semantically relevant 3D co-speech gestures by leveraging large-scale pre-training diffusion models and semantic injection with Large Language Models (LLMs).
This paper introduces LLM Gesticulator, a novel framework that leverages large language models (LLMs) to generate realistic and controllable co-speech gestures from audio and text prompts, demonstrating superior performance compared to existing methods.