Konsep Inti
Large Language Models can outperform state-of-the-art learning-based transpilers in automated code translation tasks, but they still suffer from limitations related to program comprehension, I/O type handling, and discrepancy ignorance. UniTrans, a unified code translation framework, leverages auto-generated test cases to effectively address these limitations and substantially boost the performance of various LLMs.
Abstrak
The content explores the potential of using Large Language Models (LLMs) for automated code translation tasks, which aim to translate source code from one programming language (PL) to another in an automated fashion.
The key highlights are:
-
Empirical Study on LLMs and Learning-based Transpilers:
- The authors conducted an empirical study to investigate the performance of recent LLMs, including GPT-3.5, LLaMA, and CodeGen, on code translation tasks between Python, Java, and C++.
- They compared the LLMs' performance with state-of-the-art learning-based transpilers like TransCoder, TransCoder-IR, and TransCoder-ST.
- The results showed that certain LLMs can outperform the learning-based transpilers, but they still suffer from some accuracy issues.
-
In-depth Analysis of LLM Failures:
- The authors manually analyzed 174 failed cases of the best-performing LLM (GPT-3.5) and categorized the failures into six classes, including Logic, Syntax, I/O, API, Precision, and Others.
- The analysis revealed that the main limitations of LLMs are: (1) lack of comprehension of the source program, (2) missing explicit I/O type instructions, and (3) ignoring the discrepancies between source and target programming languages.
-
Proposed UniTrans Framework:
- Motivated by the findings, the authors proposed UniTrans, a unified code translation framework that leverages auto-generated test cases to address the limitations of LLMs.
- UniTrans consists of three phases: (1) Test Case Generation, (2) Translation Augmentation, and (3) Translation Repair.
- The test cases provide information about program requirements, I/O types, and execution results to help LLMs overcome the identified limitations.
-
Extensive Experiments and Evaluations:
- The authors conducted extensive experiments to evaluate the effectiveness of UniTrans with three LLMs (GPT-3.5, LLaMA-13B, and LLaMA-7B) on six translation datasets between Python, Java, and C++.
- The results showed that UniTrans substantially boosts the code translation performance of the tested LLMs, with significant improvements in both Computational Accuracy (CA) and Exact Match Accuracy (EM Acc).
- Ablation studies and discussion experiments were also performed to investigate the contribution and influence of each component in UniTrans.
Overall, the content demonstrates the potential of using LLMs for automated code translation and proposes the UniTrans framework to effectively leverage the power of LLMs in this domain.
Statistik
The average Computational Accuracy (CA) of GPT-3.5 is improved by 4.02% with UniTrans.
The average Exact Match Accuracy (EM Acc) of GPT-3.5 is improved by 13.28% with UniTrans.
The average CA of LLaMA-13B is improved by 19.20% with UniTrans.
The average EM Acc of LLaMA-13B is improved by 36.42% with UniTrans.
The average CA of LLaMA-7B is improved by 28.58% with UniTrans.
The average EM Acc of LLaMA-7B is improved by 71.22% with UniTrans.
Kutipan
"Large Language Models (LLMs), pre-trained on billions of text/code tokens, bypass the need for re-training/fine-tuning but demonstrate the powerful generality of various code-related tasks, such as code generation [7, 11, 23, 25, 28, 52], program repair [12, 48], and code summarization [2, 14]."
"Enlightened by the above findings, we further propose UniTrans, an Unified code Translation framework, applicable to various LLMs, for unleashing their power in this field."
"Extensive experiments are conducted on six settings of translation datasets between Python, Java, and C++. Three recent LLMs of diverse sizes, including GPT-3.5, and LLaMA-13B/7B, are tested with UniTrans, and all achieve substantial improvements in terms of computational accuracy and exact match accuracy among almost all translation settings, showing the universal effectiveness of UniTrans in practice."