Bibliographic Information: Bansal, H., Hosseini, A., Agarwal, R., Tran, V. Q., & Kazemi, M. (2024). Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling. arXiv preprint arXiv:2408.16737v2.
Research Objective: This paper investigates the compute-optimality of using weaker but cheaper language models (WC) versus stronger but more expensive models (SE) for generating synthetic training data to improve the reasoning capabilities of large language models (LLMs).
Methodology: The researchers compare WC and SE models in terms of data quality (coverage, diversity, and false positive rate) and downstream task performance after fine-tuning on synthetic data. They experiment with different fine-tuning setups: knowledge distillation, self-improvement, and a novel weak-to-strong improvement paradigm. Experiments are conducted on the MATH and GSM-8K reasoning datasets using Gemma2 and Gemini 1.5 language model families.
Key Findings:
Main Conclusions:
Significance: This research provides a novel perspective on optimizing resource allocation for training LLMs, potentially leading to more efficient and accessible development of advanced language models.
Limitations and Future Research:
לשפה אחרת
מתוכן המקור
arxiv.org
תובנות מפתח מזוקקות מ:
by Hritik Bansa... ב- arxiv.org 10-10-2024
https://arxiv.org/pdf/2408.16737.pdfשאלות מעמיקות