The paper introduces SRank, a novel reranking strategy for selecting the best code solutions generated by large language models (CodeLLMs). The key idea is to model the functional overlap between clusters of code solutions, rather than treating clusters in isolation as previous methods have done.
The authors first prompt the CodeLLM to generate a set of code solutions and test cases. They then cluster the solutions based on their execution outputs, ensuring functional consistency within each cluster. Next, they compute an interaction matrix to quantify the functional overlap between the clusters. This allows them to identify the cluster with the highest cumulative overlap, which is likely to represent the optimal solution.
The authors evaluate SRank on various state-of-the-art CodeLLMs, including Codex, WizardCoder, StarCoder, and CodeGen, across the HumanEval and MBPP benchmarks. They show that SRank consistently outperforms existing reranking methods like CodeT and Coder-Reviewer, achieving significant improvements in pass@1 scores (up to 8.81% on HumanEval).
The authors also conduct extensive analyses to demonstrate the robustness of their approach, even with a limited number of sampled solutions and test cases. They validate their key assumption that incorrect solutions tend to have low functional agreement, supporting the effectiveness of their inter-cluster modeling approach.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Hung Quoc To... at arxiv.org 04-10-2024
https://arxiv.org/pdf/2311.03366.pdfDeeper Inquiries