The article presents a method for modeling social interaction dynamics using Temporal Graph Networks (TGN). The key highlights and insights are:
The authors formulate two problems: (A) scene perception and representation of social interaction dynamics using multi-modal inputs, and (B) the application of this representation for downstream tasks like next speaker prediction.
They represent social interaction as a graph model, where nodes represent subjects, and directed edges represent gaze interactions between subjects. The model is trained as a link prediction problem to estimate the probability of future gaze interactions.
The authors use the FUMI-MPF dataset, which contains 24 group discussion sessions with 110 participants and various facilitator types. They extract features like speaking status, relative seating position, and role of subjects to generate messages for the TGN model.
The authors compare two message encoding approaches, BERT and one-hot encoding, and find that one-hot encoding performs slightly better, especially in sessions without facilitators, while BERT performs better in sessions with diverse facilitator types.
The TGN model significantly outperforms a history-based baseline, achieving a 37.0% improvement in F1-score and 24.2% improvement in accuracy for the next gaze prediction task, and a 29.0% improvement in F1-score and 3.0% improvement in accuracy for the next speaker prediction task.
The authors conduct an ablation study to compare different TGN variants and baseline temporal graph models, finding that the TGN-attn and TGN-mean variants offer the best balance between accuracy and speed for modeling group interaction dynamics.
Overall, the proposed approach demonstrates the effectiveness of using Temporal Graph Networks to comprehensively represent social interaction dynamics and its application for improving human-robot collaboration tasks.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by J. Taery Kim... lúc arxiv.org 04-11-2024
https://arxiv.org/pdf/2404.06611.pdfYêu cầu sâu hơn