Extreme Quantization of Spiking Language Models for Energy-Efficient Natural Language Processing
A novel 1/1.58-bit spiking language model architecture that leverages knowledge distillation and equilibrium-based training to achieve significant energy and power efficiency while maintaining competitive performance on natural language processing tasks.