Основні поняття
複数人のシーン内での視線追跡と社会的視線予測を同時に行うための新しいフレームワークが提案されています。
Статистика
著者:Anshul Gupta, Samy Tafasca, Arya Farkhondeh, Pierre Vuillecard, Jean-Marc Odobez
技術:Transformer-based architecture, ViT tokenizer, Gaze Processor, Interaction Module, Prediction Module
Цитати
"Our model can effectively learn from a mix of video-based datasets with different statistics to perform gaze following and social gaze prediction without sacrificing performance on any of them."
"The trained model can then be further fine-tuned on individual datasets to improve performance towards a specific scenario or task."