HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
HumanEval-XL introduces a comprehensive benchmark for multilingual code generation, addressing the gap in evaluating cross-lingual NL generalization of LLMs.