MediSwift introduces sparse pre-training in biomedical language models to reduce computational costs while maintaining high performance. By leveraging up to 75% weight sparsity during pre-training, MediSwift achieves significant reductions in training FLOPs. The models outperform existing LLMs on biomedical tasks, showcasing a balance between efficiency and accuracy. The approach involves dense fine-tuning and soft prompting for optimal performance on specialized tasks. Despite challenges, sparse pre-training offers a cost-effective method for creating high-performing models in specialized domains.
إلى لغة أخرى
من محتوى المصدر
arxiv.org
الرؤى الأساسية المستخلصة من
by Vithursan Th... في arxiv.org 03-05-2024
https://arxiv.org/pdf/2403.00952.pdfاستفسارات أعمق