핵심 개념
Large language models like CodeShell-Base enhance code comprehension and generation efficiency.
통계
100 billion high-quality pre-training data from GitHub curated.
CodeShell trained on 500 billion tokens surpasses other models.
Context length increased to 8K enhances code processing capability.