DiffSHEG, a unified diffusion-based approach, enables the joint generation of synchronized expressions and gestures driven by speech, capturing their inherent relationship through a uni-directional information flow from expression to gesture.
This paper presents ProbTalk, a unified probabilistic framework that jointly models facial expressions, hand gestures, and body poses to generate variable and coordinated holistic co-speech motions for 3D avatars.