Core Concepts
Data-driven compression framework for large language models.
Stats
Structured compression (rows and columns)
Unstructured compression (matrix elements)
1.3b
2.7b
6.7b
Quotes
"The superior performance of LLM Surgeon is achieved by scaling up the block-diagonal Kronecker-factorized approximations to the empirical Fisher from Eigendamage to LLMs."
"Our method gives the first practically usable results for structured pruning of LLMs – they can be pruned by up to 30% with minor performance degradation."