A unified diffusion framework, S2Fusion, that combines scene geometry and sparse tracking signals to generate plausible and coherent full-body human motions, overcoming the inherent ambiguities in the sparse-to-dense mapping problem.
PhysPT, a Transformer encoder-decoder model, improves the physical plausibility of kinematics-based 3D human motion estimates and infers motion forces from monocular videos without requiring 3D annotated training data.