Core Concepts
Learning transfers well across several programming languages.
Abstract
The content discusses the importance of cross-lingual transfer learning in programming languages, focusing on the benefits for low-resource languages. It explores the impact of source languages on target languages, the characteristics affecting transfer performance, and the key features influencing learning transfer. The study emphasizes the practical implications for developers and the advancement of software engineering through transfer learning insights.
Directory:
- Abstract
- Large language models (LLMs) enhance developer productivity.
- Cross-lingual transfer benefits low-resource programming languages.
- Introduction
- LLMs are underutilized in low-resource languages.
- Need for AI tools to support developers in low-resource languages.
- Experimental Setup
- Tasks include error detection, solution domain classification, clone detection, and code repair.
- Model based on CodeT5 with 220M parameters.
- Results and Discussion
- Learning transfers well for all tasks.
- Target language dependency and source language transferability.
- Most transferable source languages are identified.
- Performance Prediction
- Ranker model outperforms baselines in predicting source language performance.
- Feature Analysis
- Importance of language pair features varies across tasks.
- Different tasks focus on different features for successful transfer.
- Threats to Validity
- External and internal validity considerations.
- Conclusion
- Extensive study on LLM transfer learning in programming languages.
- Data Availability
- CodeT5 model and datasets used are publicly available.
Stats
Large language models (LLMs) leverage naturalness of software [1].
COBOL may have over 775 billion lines of code overall [4].
CodeNet dataset consists of about 14M code samples in 55 programming languages [19].
CodeXGLUE benchmark dataset for code understanding and generation [3].
JavaScript and Java are among the most transferable source languages.
Kotlin and JavaScript are the best source languages for transfer learning.
Dart and TypeScript are the best target languages for transfer learning.
Quotes
"Learning transfers well for all tasks. More specifically, cross-lingual learning transfers better than zero-shot."
"Transfer learning depends on the target language. Java, Go, Dart, and TypeScript are among the best target languages."
"Kotlin and JavaScript are the best source languages, C++ is the worst."