Speech-Driven Holistic 3D Expression and Gesture Generation with Diffusion Models
DiffSHEG, a unified diffusion-based approach, enables the joint generation of synchronized expressions and gestures driven by speech, capturing their inherent relationship through a uni-directional information flow from expression to gesture.