The authors investigate the joint effects of sparse activity and sparse connectivity in the Event-based Gated Recurrent Unit (EGRU), a biologically-inspired recurrent neural network architecture. They provide evidence that sparse activity and sparse connectivity are independent means of reducing computational operations, and that jointly applying both strategies only affects task performance for high degrees of sparsity beyond 80%.
The authors first compare the performance of densely activated LSTM and sparsely activated EGRU models on the Penn Treebank and WikiText-2 language modeling datasets. They show that the EGRU model, which combines sparse activations with sparse connectivity, achieves competitive results compared to dense LSTM baselines.
To understand the interaction between the two sparsities, the authors systematically prune the LSTM and EGRU models to varying degrees of connectivity sparsity. They find that the effects of sparse activations and sparse weights are multiplicative on the reduction in computational operations, as conjectured by prior work. The task performance degradation follows a similar trend for both models as connections are removed.
Furthermore, the authors uncover a mechanism that allows trading weight regularization for sparse activations in the EGRU model. They observe that the training process drives the mean values of weights and biases below 0, which promotes sparse network activity but interferes with standard weight decay regularization. This provides a way to tune the activity of the EGRU network to meet the requirements of a target hardware system.
Overall, the results suggest that sparsely connected event-based neural networks are promising candidates for efficient and effective sequence modeling on neuromorphic hardware.
A otro idioma
del contenido fuente
arxiv.org
Consultas más profundas