Smart-Infinity addresses storage bandwidth bottleneck in large language model training using near-storage processing devices.
This paper introduces a novel training paradigm for updating large language models (LLMs) that balances pre-training performance with reduced training cost by strategically switching learning rates.
Programmatically generated training data based on simple patterns over random tokens can effectively improve the generative capabilities of large language models (LLMs) on various natural language tasks.