Modeling Social Interaction Dynamics Using Temporal Graph Networks for Improved Human-Robot Collaboration
The core message of this article is to propose an adapted Temporal Graph Networks (TGN) model that can comprehensively represent social interaction dynamics by incorporating temporal multi-modal behavioral data, including gaze interaction, voice activity, and environmental context. This representation enables practical implementation and outperforms baseline models for tasks like next gaze prediction and next speaker prediction, which are crucial for effective human-robot collaboration.