Differentiating Transformer Sub-Layers for Efficient Structured Compression of Large Language Models
Transformer sub-layers exhibit varying low-rank properties, requiring differentiated compression strategies for efficient model size reduction while preserving performance.