Reducing Redundancy in Large Language Models: Optimizing Inference Costs through Selective Layer Removal
Significant redundancy exists in large language models, with nearly half of the model layers being potentially unnecessary. Selective removal of these redundant layers can substantially reduce inference costs without significantly impacting model performance.