Core Concepts
An active measurement and sensor fusion framework that can accurately estimate human poses in close proximity by dynamically optimizing camera viewpoints and sensor placements to complement limited information from camera observations.
Abstract
The proposed method addresses the challenge of human pose estimation in physical human-robot interaction (pHRI) scenarios, where the robot cannot fully observe the target person's body due to severe truncation and occlusion caused by the close distance.
The key components of the method are:
-
Active Measurement:
- Camera viewpoint optimization: The camera viewpoint is optimized to cover the largest body surface within the camera image and reduce truncation and self-occlusion, weighted by the estimated uncertainty of the pose.
- Touch and ranging sensor placement optimization: The placements of touch and ranging sensors (e.g., 2D LiDAR) are optimized to measure the most uncertain body parts that are not well-covered by the camera.
-
Sensor Fusion:
- Global position offset: The estimated pose is offset by the mean vector between the sensor measurements and the nearest mesh vertices to address the depth ambiguity in camera-based estimates.
- Local pose optimization: The pose parameters are further optimized by minimizing the distance between the body surface and the sensor measurements.
The proposed method outperformed previous methods on the standard occlusion benchmark with simulated active measurement. It also demonstrated reliable human pose estimation in real-world experiments using a robot, even with practical constraints such as occlusion by blankets.
Stats
The mean Euclidean distance error of the estimated poses on the standing poses is 235.9 mm, and on the sitting poses is 270.0 mm.
Quotes
"Our proposed method outperformed previous methods on the standard occlusion benchmark with simulated active measurement. Furthermore, our method reliably estimated human poses using a real robot, even with practical constraints such as occlusion by blankets."