MediSwift introduces sparse pre-training in biomedical language models to reduce computational costs while maintaining high performance. By leveraging up to 75% weight sparsity during pre-training, MediSwift achieves significant reductions in training FLOPs. The models outperform existing LLMs on biomedical tasks, showcasing a balance between efficiency and accuracy. The approach involves dense fine-tuning and soft prompting for optimal performance on specialized tasks. Despite challenges, sparse pre-training offers a cost-effective method for creating high-performing models in specialized domains.
Para Outro Idioma
do conteúdo original
arxiv.org
Principais Insights Extraídos De
by Vithursan Th... às arxiv.org 03-05-2024
https://arxiv.org/pdf/2403.00952.pdfPerguntas Mais Profundas