MediSwift introduces sparse pre-training in biomedical language models to reduce computational costs while maintaining high performance. By leveraging up to 75% weight sparsity during pre-training, MediSwift achieves significant reductions in training FLOPs. The models outperform existing LLMs on biomedical tasks, showcasing a balance between efficiency and accuracy. The approach involves dense fine-tuning and soft prompting for optimal performance on specialized tasks. Despite challenges, sparse pre-training offers a cost-effective method for creating high-performing models in specialized domains.
Naar een andere taal
vanuit de broninhoud
arxiv.org
Belangrijkste Inzichten Gedestilleerd Uit
by Vithursan Th... om arxiv.org 03-05-2024
https://arxiv.org/pdf/2403.00952.pdfDiepere vragen