Realistic Long-Term 3D Human Motion Forecasting with Multimodal Scene Context
The authors propose a scene-aware social transformer model (SAST) that can efficiently forecast long-term (10 seconds) human motion in complex multi-person environments by leveraging both motion and scene context information.