Keskeiset käsitteet
This paper introduces a novel training paradigm for updating large language models (LLMs) that balances pre-training performance with reduced training cost by strategically switching learning rates.
Tilastot
When training four versions of LLMs, the proposed paradigm reduces the total training cost to 58% compared to PTFS, while maintaining comparable pre-training performance.
For the same number of version updates, the time complexity of PTFS is quadratic, while the time complexity of both CPT and the proposed paradigm is linear.
Lainaukset
"To the best of our knowledge, our work is the first attempt to explore how to balance model performance and training cost for version updates of LLMs."
"Our paradigm better balances model performance and training cost compared to the other two paradigms."