The author presents CLongEval as a comprehensive benchmark to evaluate long-context capabilities of Chinese language models, addressing the lack of robust evaluation benchmarks in this domain.
Developing a comprehensive evaluation benchmark, CLongEval, for long-context Chinese LLMs.
LHMKE provides a holistic evaluation benchmark for Chinese Large Language Models, encompassing diverse question types and subjects.