核心概念
The author introduces Beyond-Voice, a system that enables continuous 3D hand pose tracking on home assistant devices using acoustic sensing. The system transforms the device into an active sonar system to analyze reflections and reconstruct hand poses.
摘要
Beyond-Voice introduces a novel high-fidelity acoustic sensing system for hand pose tracking on home assistant devices. It leverages existing onboard microphones and speakers to track and reconstruct hand poses continuously in various environments without personalized training data. The system operates by transmitting inaudible ultrasound chirps and analyzing reflections to predict the 3D positions of 21 finger joints. By utilizing deep learning models, data preprocessing techniques, and hardware starting time cancellation, Beyond-Voice achieves accurate hand tracking results across different users and environments.
统计
A user study with 11 participants shows an average mean absolute error of 16.47mm for user-independent testing.
User-adaptive evaluation reduces the mean absolute error to 10.36mm with adaptive training.
In user-dependent testing, the mean absolute error is slightly higher at 12.49mm.
Data augmentation improves performance until a factor of x6, after which there is a slight rebound in error.
Finger-wise analysis reveals higher errors for middle fingers compared to others.
Bone-wise analysis shows higher errors for bones closer to the fingertip.
Orientation analysis indicates no significant impact on performance based on palm rotation, azimuth, and elevation angles.