Comprehensive Benchmark for Assessing Large Language Models' Educational Capabilities Using Chinese Junior High School Exam Data
CJEval, a comprehensive benchmark based on Chinese Junior High School exam data, is introduced to assess the capabilities of Large Language Models in diverse educational tasks, including knowledge concept tagging, question difficulty prediction, question answering, and question generation.