Keskeiset käsitteet
Eigenpruning is a method that removes singular values from weight matrices in large language models (LLMs) to improve their performance on specific tasks. This approach is inspired by interpretability methods that aim to automatically find subnetworks of a model that can effectively solve a given task.
Tiivistelmä
The paper introduces eigenpruning, a novel method for improving the performance of large language models (LLMs) on specific tasks. The key insights are:
-
Existing automated circuit discovery approaches, such as ACDC and Attribution Patching, use "big" nodes (attention heads and MLP layers) in their definitions, which may not capture the true computations in an LLM.
-
Instead of directly removing edges from the computational graph, eigenpruning removes singular values from weight matrices, which can lead to more natural changes in the model's activation distribution.
The eigenpruning method works as follows:
- Manually select a subset of weight matrices (M) in the LLM to be pruned, such as the key matrices in transformer blocks.
- For each matrix A in M, compute its singular value decomposition (SVD) as A = USV.
- Use a linear approximation to estimate the effect of removing each singular value Ei on the model's loss. The singular values with the most negative impact are considered "weak" and are pruned.
- Update the weight matrix A and its associated bias b to freeze the effect of the pruned singular values.
The authors test eigenpruning on two synthetic datasets (integer addition and multiplication) and three tasks from the SuperGLUE benchmark (CB, COPA, and RTE). They find that eigenpruning can significantly improve the performance of the Phi-2 model, particularly on the synthetic tasks. The results on the NLP tasks are more modest but still promising, with a 6% improvement on the COPA task.
The authors acknowledge several limitations, including the need to test eigenpruning on a wider range of models and the potential overfitting of the synthetic datasets. They also note the need to further explore the effects of finetuning in combination with eigenpruning.
Overall, the eigenpruning method presents a novel and computationally efficient approach to improving LLM performance on specific tasks, with the potential to provide insights into the inner workings of these complex models.
Tilastot
The accuracy improvements in the test set for different models and datasets are as follows:
Phi-2 model:
Integer Sum (INT-SUM): 1.45% -> 46.10%
Integer Multiplication (INT-MULT): 13.75% -> 97.50%
Commitment Bank (CB): 42.86% -> 51.79%
Choice of Plausible Alternatives (COPA): 78.00% -> 84.00%
Recognizing Textual Entailment (RTE): 42.96% -> 44.04%
GPT-2 model:
INT-SUM: 3.00% -> 3.20%
INT-MULT: 13.75% -> 15.00%
CB: 10.71% -> 12.50%
COPA: 55.00% -> 55.00%
RTE: 0.36% -> 1.44%
Finetuned GPT-2 model:
INT-SUM: 4.00% -> 4.00%
INT-MULT: 0.00% -> 0.00%
CB: 41.07% -> 50.00%
COPA: 55.00% -> 55.00%
RTE: 47.29% -> 52.71%
Lainaukset
"Interestingly, these results seem to indicate the existence of a computation path that can solve the task very effectively, but it was not being used by the original model."
"These results are promising, both in terms of performance and of our understanding of how LLMs work."