المفاهيم الأساسية
The proposed Meta-SpikeFormer architecture achieves state-of-the-art performance in the spiking neural network domain, surpassing current baselines by a significant margin while maintaining low power consumption. It is the first direct training spiking neural network backbone that can handle classification, detection, and segmentation tasks concurrently.
الملخص
The paper proposes a meta spiking neural network architecture called Meta-SpikeFormer, which aims to address the limitations of current spiking neural network (SNN) designs. The key highlights are:
Architecture Design:
The overall architecture follows a general vision transformer structure, with both Conv-based and Transformer-based SNN blocks.
The Conv-based SNN blocks use spike-driven separable convolution and channel-wise convolution, while the Transformer-based SNN blocks employ a novel Spike-Driven Self-Attention (SDSA) operator.
The architecture includes a pyramid structure to capture multi-scale features, and various shortcut connections are explored.
Performance Achievements:
On ImageNet-1K classification, Meta-SpikeFormer achieves 80.0% top-1 accuracy, surpassing the previous state-of-the-art SNN baseline by 3.7% with 17% fewer parameters.
Meta-SpikeFormer is the first direct training SNN backbone that can handle classification, object detection, and semantic segmentation tasks concurrently, achieving state-of-the-art results in the SNN domain.
Versatility and Efficiency:
The proposed architecture demonstrates high versatility, performing well on various vision tasks including image classification, event-based action recognition, object detection, and semantic segmentation.
The spike-driven design and SDSA operator enable Meta-SpikeFormer to achieve significant power efficiency compared to traditional ANN counterparts.
Inspiration for Neuromorphic Chip Design:
The meta architecture, SDSA operator, and hybrid Conv+Transformer design of Meta-SpikeFormer provide valuable insights for the development of future Transformer-based neuromorphic chips.
In summary, the Meta-SpikeFormer architecture represents a significant advancement in the SNN domain, pushing the boundaries of performance, versatility, and efficiency, while also inspiring the next generation of neuromorphic computing hardware.
الإحصائيات
The paper reports the following key metrics:
ImageNet-1K classification accuracy:
Meta-SpikeFormer (55M parameters): 80.0%
Spike-driven Transformer (66M parameters): 76.3%
MS-Res-SNN (77M parameters): 75.3%
Power consumption on ImageNet-1K:
Meta-SpikeFormer (55M, T=4): 52.4 mJ
Spike-driven Transformer (66M, T=4): 6.1 mJ
MS-Res-SNN (77M, T=4): 10.2 mJ
Object detection mAP@0.5 on COCO:
Meta-SpikeFormer (75M): 51.2%
EMS-Res-SNN (14.6M): 50.1%
Semantic segmentation mIoU on ADE20K:
Meta-SpikeFormer (58.9M, T=4): 35.3%
PVT-Small (28.2M): 39.8%
اقتباسات
"Meta-SpikeFormer enables the performance of the SNN domain on ImageNet-1K to achieve 80% for the first time, which is 3.7% higher than the current SOTA baseline but with 17% fewer parameters (55M vs. 66M)."
"To the best of our knowledge, Meta-SpikeFormer is the first direct training SNN backbone that can handle image classification, object detection, semantic segmentation concurrently. We achieve SOTA results in the SNN domain on all tested datasets."