toplogo
登录
洞察 - Software Testing - # Automated Test Generation with LLMs

COVERUP: Coverage-Guided LLM-Based Test Generation


核心概念
COVERUP is a novel system that significantly improves Python regression test coverage by combining coverage analysis and large-language models (LLMs).
摘要

The content introduces COVERUP, a system for generating high-coverage Python regression tests using coverage analysis and LLMs. It iteratively refines prompts to focus on uncovered code segments, leading to substantial improvements in test suite coverage. The paper compares COVERUP to CODAMOSA, showing superior results.

I. Introduction:

  • Test generation tools aim to increase program coverage.
  • Pynguin uses genetic algorithms but can get stuck.
  • CODAMOSA combines Pynguin with an LLM for stalled searches.

II. Related Work:

  • Various methods exist for automated test generation.
  • Large language models have been applied in software testing.

III. Technique:

  • COVERUP measures code coverage and segments code for prompting.
  • It interacts with the LLM through chat prompts.
  • Tests generated are executed and checked for coverage improvement.

IV. Evaluation:

  • COVERUP outperforms CODAMOSA in overall and per-module coverage.
  • Results show the effectiveness of iterative refinement in prompt generation.

V. Threats to Validity:

  • Benchmark selection may influence results.
  • Execution environment discrepancies could impact outcomes.

VI. Discussion and Future Work:

  • Future work includes evaluating assertions in generated tests.
  • Addressing cases where required modules are missing during test execution.

VII. Conclusion:

COVERUP is a promising system for improving test suite coverage through iterative refinement of prompts based on code coverage information, outperforming previous state-of-the-art approaches like CODAMOSA.

edit_icon

自定义摘要

edit_icon

使用 AI 改写

edit_icon

生成参考文献

translate_icon

翻译原文

visual_icon

生成思维导图

visit_icon

访问来源

统计
COVERUP achieves median line coverage of 81%, branch coverage of 53%, and line+branch coverage of 78% compared to CODAMOSA's 62%, 35%, and 55% respectively.
引用
"COVERUP yields higher overall line, branch, and combined line+branch coverages than both CODAMOSA (codex) and CODAMOSA (gpt4)." "Continuing the chat contributes to nearly half of successes, demonstrating its effectiveness." "COVERUP still outperforms CODAMOSA using a state-of-the-art LLM."

从中提取的关键见解

by Juan Altmaye... arxiv.org 03-26-2024

https://arxiv.org/pdf/2403.16218.pdf
CoverUp

更深入的查询

How does COVERUP handle flaky tests?

COVERUP addresses flaky tests by allowing users to specify custom arguments for pytest, enabling the repetition of each test a certain number of times using the pytest-repeat plugin. By running potentially flaky tests multiple times, COVERUP increases the likelihood of identifying and resolving inconsistencies in test outcomes. This approach helps mitigate the unreliability often associated with flaky tests.

What are the implications of using different large language models on the effectiveness of COVERUP?

The choice of large language model (LLM) can significantly impact the effectiveness of COVERUP in generating high-coverage regression tests. Different LLMs may have varying capabilities in understanding prompts, generating appropriate test cases, and adapting to coverage analysis feedback provided by COVERUP. Opting for more advanced or specialized LLMs could enhance the quality and efficiency of test generation processes within COVERUP.

How can COVERUP be adapted for use with other types of software beyond Python?

To adapt COVERUP for use with software beyond Python, several modifications and enhancements may be necessary: Language Support: Extend support for additional programming languages by adjusting prompt structures, code segmentation techniques, and coverage analysis mechanisms tailored to specific language syntax. Tool Integration: Integrate with testing frameworks commonly used in other languages to ensure compatibility and seamless execution. Model Flexibility: Allow flexibility in choosing different LLMs suitable for diverse programming paradigms while maintaining effective communication between CoverUp's system components. Domain-Specific Adaptations: Customize prompts based on domain-specific requirements or coding conventions prevalent in target software environments outside Python. By incorporating these adaptations, CoverUp can broaden its applicability across a wider range of software development contexts beyond Python projects.
0
star