核心概念
Large language models like CodeShell-Base enhance code comprehension and generation efficiency.
統計
We have curated 100 billion high-quality pre-training data from GitHub.
Benefiting from the high-quality data, CodeShell outperforms CodeLlama in Humaneval after training on just 500 billion tokens (5 epochs).
引用
"We released CodeShell-7B, a new large code foundation model pre-trained from scratch featuring a novel and unique architecture design."
"To address more complex coding tasks, we have increased the model’s context length to 8K, enhancing its capability to process longer code segments."