Leveraging Pre-Trained Foundation Models to Boost Small Model Performance Without Costly Pre-Training
Small models can achieve or surpass the performance they would achieve if pre-trained and finetuned, by leveraging knowledge distillation from publicly available pre-trained teacher models and augmenting the training dataset with synthetic samples.