ロボットの共感能力が高いほど、利用者はそのロボットに対して心的状態を帰属する傾向がある。
People imagine hidden "phantom costs" when robots make overly generous offers, leading them to be less likely to accept the offers.
A novel deep learning framework, URGR, enables robust recognition of human gestures from distances up to 25 meters using only a simple RGB camera. The framework combines a super-resolution model, HQ-Net, and a hybrid classifier, GViT, to overcome the challenges of low-resolution and blurry images at long distances.
Introducing two methods, surround dense sampling and Online Temporally Aware Label Cleaning (O-TALC), to improve the performance of online temporal action segmentation by addressing the issues of inaccurate segment boundaries and oversegmentation.
The auditory detectability of a wheeled robot and a quadruped robot varies significantly, with the quadruped robot being detected at much larger distances, even in high background noise. This has important implications for the design of human-centered robot navigation algorithms.
The core message of this article is to propose an adapted Temporal Graph Networks (TGN) model that can comprehensively represent social interaction dynamics by incorporating temporal multi-modal behavioral data, including gaze interaction, voice activity, and environmental context. This representation enables practical implementation and outperforms baseline models for tasks like next gaze prediction and next speaker prediction, which are crucial for effective human-robot collaboration.
Our framework ECHO learns a shared representation space between humans and robots to generate socially compliant robot behaviors by forecasting human motions in interactive social scenarios.
A novel tele-immersive framework that promotes cognitive and physical collaboration between humans and drones through Mixed Reality, incorporating bi-directional spatial awareness and multi-modal virtual-physical interaction approaches.
Our proposed HOI4ABOT framework leverages temporal cues from videos to efficiently detect and anticipate human-object interactions, empowering collaborative robots to proactively assist humans in a timely manner.
The proposed 2-Channel Transformer (2CH-TR) model efficiently exploits spatio-temporal dependencies in observed human motion to generate accurate short-term and long-term 3D pose predictions, while demonstrating robustness to severe occlusions in the input data.