A Framework for Synthesizing Realistic Human-Scene Interactions from Text Instructions and Goal Locations using Diffusion Models
This paper introduces a novel framework for generating realistic and interactive human motion in 3D environments from simple text instructions and goal locations, addressing the limitations of previous methods by integrating locomotion, object interaction, and scene awareness into a unified, autonomous system.