核心概念
CatCode provides a comprehensive evaluation framework for LLMs, focusing on code understanding and generation.
摘要
CatCode introduces a novel evaluation framework based on category theory for LLMs, emphasizing code understanding and generation. The framework includes morphisms, functors, and standardized evaluation metrics. The study covers morphism identification, translation functor, and explanation functor experiments, highlighting model capabilities and limitations.
Morphism Identification Experiment:
- Utilized datasets: HumanEval-X, MBXP, MathQA, Code Contest
- Identified challenging morphisms: Unused Statements, Modify Condition, Boolean Exchange
- Dataset-specific difficulties observed
Translation Functor Experiment:
- Models used: Text-Davinci, ChatGPT, CodeGeeX
- Pass@1 scores: ChatGPT outperformed in translation ability
- Common failure types: Compilation errors, type mismatches, variable name discrepancies
統計資料
현재 데이터 시트에는 특별한 메트릭이 없습니다.
引述
이 연구는 LLM의 코딩 능력을 평가하기 위한 새로운 시각을 제시합니다.
LLM의 능력을 평가하기 위한 통합적이고 표준화된 자동 평가 플랫폼을 제공합니다.