Core Concepts
EventEgo3D is the first approach for real-time 3D human motion capture from egocentric monocular event streams, achieving high 3D reconstruction accuracy and supporting pose update rates of 140Hz.
Abstract
The paper introduces EventEgo3D (EE3D), the first approach for 3D human motion capture from an egocentric monocular event camera with a fisheye lens. Event cameras provide high temporal resolution and work well under low lighting and fast motions, addressing the limitations of existing RGB-based egocentric 3D pose estimation methods.
The key highlights of the paper are:
EE3D is a novel neural network architecture tailored for learning from event streams in the LNES representation, enabling high 3D reconstruction accuracy.
The Residual Event Propagation Module (REPM) prioritizes events triggered around the human, helping the network focus on the human subject rather than background events.
The authors design a compact head-mounted device (HMD) with an egocentric event camera and record a real dataset with event observations and ground-truth 3D human poses, in addition to a large-scale synthetic dataset.
EE3D demonstrates superior 3D accuracy compared to adapted versions of existing RGB-based and event-based methods, while supporting real-time 3D pose update rates of 140Hz.
Extensive experiments, including an ablation study, validate the contributions of the core modules of EE3D.
The paper addresses a new problem in egocentric 3D vision and provides a comprehensive solution, including hardware design, datasets, and a tailored neural architecture. The results showcase the advantages of event cameras over RGB sensors for this task.
Stats
The paper reports the following key metrics:
Mean Per Joint Position Error (MPJPE) on the EE3D-R dataset:
Walking: 70.88 mm
Crouching: 163.84 mm
Pushups: 97.88 mm
Boxing: 136.57 mm
Kicking: 103.72 mm
Dancing: 88.87 mm
Interaction with environment: 103.19 mm
Crawling: 109.71 mm
Sports: 101.02 mm
Jumping: 97.32 mm
Average: 107.30 mm (σ = 25.78 mm)
Procrustes Aligned MPJPE (PA-MPJPE) on the EE3D-R dataset:
Walking: 52.11 mm
Crouching: 99.48 mm
Pushups: 75.53 mm
Boxing: 104.66 mm
Kicking: 86.05 mm
Dancing: 71.96 mm
Interaction with environment: 70.85 mm
Crawling: 77.94 mm
Sports: 77.82 mm
Jumping: 80.17 mm
Average: 79.66 mm (σ = 14.83 mm)
Quotes
"EventEgo3D is the first approach for real-time 3D human motion capture from egocentric event streams."
"Our EE3D demonstrates robustness and superior 3D accuracy compared to existing solutions across various challenging experiments while supporting real-time 3D pose update rates of 140Hz."