Centrala begrepp
Spiking lottery tickets can achieve higher sparsity and less performance degradation compared to standard lottery tickets, enabling the development of efficient and energy-saving spiking neural networks.
Sammanfattning
The paper explores the properties of spiking lottery tickets (SLTs) and compares them to standard lottery tickets (LTs) in both convolutional neural network (CNN) and transformer-based spiking neural network (SNN) structures.
For CNN-based models, the authors find that inner SLTs achieve higher sparsity and fewer performance losses compared to LTs (Reward 1). For transformer-based models, SLTs incur less accuracy loss compared to LTs counterparts at the same level of multi-level sparsity (Reward 2).
The authors propose a multi-level sparsity exploring algorithm for spiking transformers, which effectively achieves sparsity in the patch embedding projection (ConvPEP) module's weights, activations, and input patch numbers. Extensive experiments on RGB and event-based datasets demonstrate that the proposed SLT methods outperform standard LTs while achieving extreme energy savings (>80.0%).
The paper also analyzes the impact of spiking neural network parameters, such as time step and decay rate, on the performance of SLTs. The results show that increasing the time step can improve the performance of SLTs, while the optimal decay rate exhibits a non-monotonic relationship.
Statistik
The dense ANN model on CIFAR10 achieves 88.10% accuracy.
The dense SNN model on CIFAR10 achieves 87.71% accuracy.
The MPSLT model on CIFAR10 achieves 87.76% accuracy, a 0.24% improvement over the MPLT model.
The MultiSp-SLT model on CIFAR10 achieves 90.21% accuracy, a 0.34% improvement over the MultiSp-LT model.
The theoretical energy consumption of the MPSLT model on CIFAR10 is only 11.2% of the ANN counterpart.
Citat
"Spiking neural network is a bio-inspired algorithm that simulates the real process of signaling that occurs in brains."
"The Lottery Ticket Hypothesis (LTH) offers a promising solution. LTH suggests that within a randomly initialized dense neural network, there exist efficient sub-networks, which can achieve the comparable accuracy of the full network within the same or fewer iterations."
"Our approach analyses the problem from a dual-level perspective: (1) Redesigning existing models at the neuron level to obtain an efficient new network structure; (2) Developing numerous sparse algorithms to reduce the parameter size of original dense models at the structure level."