Core Concepts
NEUROPRUNE is a neuro-inspired topological sparse training algorithm that exploits mechanisms seen in biological networks, such as preferential attachment and redundant synapse pruning, to achieve performant and efficient large language models across diverse NLP tasks.
Abstract
The paper proposes NEUROPRUNE, a neuro-inspired sparse training algorithm for large language models (LLMs) that aims to address the computational overhead required for training and inference of these models.
Key highlights:
NEUROPRUNE is inspired by mechanisms seen in biological neuronal networks, such as preferential attachment and redundant synapse pruning.
It induces sparsity at three levels: the Multi-Layer Perceptron (MLP) layers, the attention layers, and the attention heads.
For the MLP layers, NEUROPRUNE uses a weighted l1 penalty that encourages preferential attachment, where neurons with higher connectivity are maintained while those with lower connectivity are pruned.
For the attention layers, NEUROPRUNE leverages group sparsity to remove entire rows in the concatenated attention matrices, effectively eliminating the influence of certain input dimensions.
For the attention heads, NEUROPRUNE removes redundant heads by identifying a minimum set of heads that can represent the functionality of the others, using a dominating set problem formulation.
NEUROPRUNE is model-agnostic and task-agnostic, and is shown to be competitive with or sometimes superior to baselines on performance, while being up to 10x faster in terms of training time for a given level of sparsity, and exhibiting measurable improvements in inference time in many cases.
Stats
NEUROPRUNE can achieve up to 10x faster training times compared to baselines for a given level of sparsity.
NEUROPRUNE exhibits measurable improvements in inference time in many cases.
Quotes
"NEUROPRUNE is competitive with (or sometimes superior to) baselines on performance and can be up to 10x faster in terms of training time for a given level of sparsity, simultaneously exhibiting measurable improvements in inference time in many cases."