Centrala begrepp
Integrating state information (drone's pitch and roll) with multi-sensorial input (low-resolution images and depth maps) can significantly improve the performance of deep learning-based human pose estimation on autonomous nano-drones.
Sammanfattning
This paper presents a novel deep learning-based pipeline that fuses multi-sensorial input (low-resolution images and 8x8 depth maps) with the nano-drone's state information (pitch and roll) to tackle a human pose estimation task. The authors explore different fusion techniques, including input fusion, mid-fusion, and late fusion, and evaluate the impact of incorporating state information on the regression performance.
The key findings are:
Introducing state information consistently improves the regression performance, with an R2 score increase of up to 0.11 compared to the state-unaware baseline model.
The best-performing model uses only the pitch state information, with a late fusion (direct) approach, achieving a 0.10 R2 increase on the x output and 0.01 increase on the y output, while incurring a negligible computational cost increase of just 0.11%.
The authors provide a detailed analysis of the trade-offs between regression performance, memory usage, and computational cost for the different fusion techniques, highlighting the benefits of the late fusion (direct) approach as the most convenient option.
Overall, this work demonstrates the value of incorporating state information to enhance the capabilities of deep learning-based perception tasks on resource-constrained autonomous nano-drones.
Statistik
The authors report the following key figures:
R2 score of the state-unaware baseline model: 0.35 on x, 0.46 on y
Peak R2 score of the best-performing state-aware model: 0.45 on x, 0.48 on y
Memory usage increase: up to 2.6% compared to the baseline
Computational cost increase: up to 21% compared to the baseline
Citat
"Our key findings consistently show the benefit of the state fusion: mean R2 improvement of 0.06 on the x variable w.r.t. the SoA baseline model."
"Our best model, the late fusion approach, increases the R2 up to 0.10 and 0.01 on x and y with a negligible overhead in memory or computation."