Sheared LLaMA: Accelerating Large Language Model Pre-Training via Structured Pruning
Leveraging structured pruning and continued pre-training, we can produce smaller yet competitive large language models that require only a fraction of the compute budget compared to training from scratch.