MediSwift introduces sparse pre-training in biomedical language models to reduce computational costs while maintaining high performance. By leveraging up to 75% weight sparsity during pre-training, MediSwift achieves significant reductions in training FLOPs. The models outperform existing LLMs on biomedical tasks, showcasing a balance between efficiency and accuracy. The approach involves dense fine-tuning and soft prompting for optimal performance on specialized tasks. Despite challenges, sparse pre-training offers a cost-effective method for creating high-performing models in specialized domains.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Vithursan Th... at arxiv.org 03-05-2024
https://arxiv.org/pdf/2403.00952.pdfDeeper Inquiries