Efficient Distillation of Large Language Models for Edge Deployment
A parameter-efficient, distillation-based approach for training a palette of smaller language models from a large pre-trained teacher model, enabling efficient deployment on edge devices.