核心概念
Our framework ECHO learns a shared representation space between humans and robots to generate socially compliant robot behaviors by forecasting human motions in interactive social scenarios.
摘要
The paper proposes a two-step framework called ECHO to generate natural and meaningful human-robot interactions.
First, the authors build a shared latent space that represents the semantics of human and robot poses, enabling effective motion retargeting between them. This shared space is learned without the need for annotated human-robot skeleton pairs.
Second, the ECHO architecture operates in this shared space to forecast human motions in social scenarios. It first learns to predict individual human motions using a self-attention transformer. Then, it iteratively refines these motions based on the surrounding agents using a cross-attention mechanism. This refinement process ensures the generated motions are socially compliant and synchronized.
The authors evaluate ECHO on the large-scale InterGen dataset for social motion forecasting and the CHICO dataset for human-robot collaboration tasks. ECHO outperforms state-of-the-art methods by a large margin in both settings, demonstrating its effectiveness in generating natural and accurate human-robot interactions.
The key innovations include:
- Learning a shared latent space between humans and various robots that preserves pose semantics.
- A two-step architecture that first predicts individual motions and then refines them based on the social context.
- Conditioning the motion synthesis on text commands to control the type of social interaction.
- Achieving state-of-the-art performance in social motion forecasting and human-robot collaboration tasks.
統計資料
The authors use the following datasets:
InterGen dataset: Largest 3D human motion dataset with 6022 interactions of two people and 16756 natural language annotations.
Robot retargeting collection: Randomly sampled robot joint angles from the Tiago++ and JVRC-1 robots.
CHICO dataset: 3D motion dataset for Human-Robot Collaboration with a single operator performing assembly tasks with a Kuka LBR robot.
引述
"Our overall framework can decode the robot's motion in a social environment, closing the gap for natural and accurate Human-Robot Interaction."
"Contrary to prior works, we reformulate the social motion problem as the refinement of the predicted individual motions based on the surrounding agents, which facilitates the training while allowing for single-motion forecasting when only one human is in the scene."