Kernekoncepter
The proposed Scene Adaptive Sparse Transformer (SAST) achieves a remarkable balance between performance and efficiency for event-based object detection by enabling window-token co-sparsification and scene-specific sparsity optimization.
Resumé
The content discusses the development of an efficient and powerful Scene Adaptive Sparse Transformer (SAST) for event-based object detection tasks.
Key highlights:
- Event cameras possess advantages such as high temporal resolution and wide dynamic range, enabling energy-efficient solutions in power-constrained environments. However, the high computational complexity of dense Transformer networks diminishes the low power consumption advantage of event cameras.
- SAST achieves window-token co-sparsification, significantly enhancing fault tolerance and reducing computational overhead. It leverages innovative scoring and selection modules to realize scene-specific sparsity optimization, dynamically adjusting the sparsity level based on scene complexity.
- SAST also proposes the Masked Sparse Window Self-Attention (MS-WSA), which efficiently performs self-attention on selected tokens with unequal window sizes and isolates all context leakage.
- Experimental results on the 1Mpx and Gen1 datasets demonstrate that SAST outperforms all other dense and sparse networks in both performance and efficiency.
Statistik
The 1Mpx dataset contains over 25M bounding boxes across 7 labeled object classes, with a labeling frequency of 60 Hz.
The Gen1 dataset comprises 39 hours of events with a resolution of 304×240 pixels and 2 object classes, with a labeling frequency of 20 Hz.