Core Concepts
This (incomplete) survey paper aims to provide a comprehensive overview of the rapidly developing field of Large Language Models (LLMs) for code generation, focusing on their evolution, recent advancements, evaluation methods, practical applications, and future challenges.
Stats
From a single paper in the period 2018 to 2020, the numbers increased to 6 in 2021, 11 in 2022, 75 in 2023, and 140 in 2024.
14% of the papers are published in LLM-specific venues and 7% in SE venues.
49% of the papers remain unpublished in peer-reviewed venues and are available on arXiv.
Pre-training and Foundation Models (21.5%)
Prompting (11.8%)
Evaluation and Benchmarks (24.1%)
Quotes
"The advent of Large Language Models (LLMs) such as ChatGPT1 [196] has profoundly transformed the landscape of automated code-related tasks [48], including code completion [87, 171, 270, 282], code translation [52, 135, 245], and code repair [75, 126, 195, 204, 291, 310]."
"This area has garnered substantial interest from both academia and industry, as evidenced by the development of tools like GitHub Copilot2 [48], CodeGeeX3 [321], and Amazon CodeWhisperer4, which leverage groundbreaking code LLMs to facilitate software development."
"The performance of LLMs on code generation tasks has seen remarkable improvements, as illustrated by the HumanEval leaderboard5, which showcases the evolution from PaLM 8B [54] of 3.6% to LDB [325] of 95.1% on Pass@1 metrics."