A novel pruning technique called Blockwise Parameter-Efficient Sparsity Allocation (BESA) that optimizes pruning rates across different layers of large language models to minimize performance degradation.
A simple and effective pruning method, Wanda, can induce high sparsity in pretrained large language models without any retraining or weight update.
Outlier-Weighed Layerwise Sparsity (OWL) is a novel pruning methodology that leverages the non-uniform distribution of outlier features across different layers of Large Language Models (LLMs) to achieve superior performance at high sparsity levels compared to existing pruning techniques.