Integrating Large Language Models (LLMs) with code knowledge graphs significantly enhances automated fuzz driver generation, leading to improved code coverage and more effective vulnerability detection in software testing.
Large Language Models (LLMs) are prone to embedding social biases in generated code, necessitating a dedicated framework called Solar for evaluation and mitigation of these biases to ensure fairness.
AI 기반 시스템, 특히 학습 분석(LA) 분야에서 감사 가능성을 높이려면 검증 가능한 주장, 이를 뒷받침하는 증거, 감사자의 접근성을 포괄하는 프레임워크가 필수적입니다.
VALTEST leverages token probabilities from Large Language Models (LLMs) to automatically validate the correctness of generated test cases, even when the source code is unavailable, significantly improving the reliability of LLM-based software testing.
This paper introduces a novel model-based testing framework for Deep Reinforcement Learning (DRL) policies that prioritizes testing in states where decisions have the most significant impact on safety, thereby enabling efficient and rigorous safety verification.
TestART, a novel approach combining large language models (LLMs) with template-based repair, significantly improves the quality and effectiveness of automated unit test generation for Java code.
SYNTER is a novel approach that leverages the power of Large Language Models (LLMs) in combination with static analysis and neural reranking techniques to automatically repair obsolete test cases caused by syntactic breaking changes in evolving software.
Precise and complete code context is crucial for Large Language Models (LLMs) to effectively mitigate false positives in Static Application Security Testing (SAST) tools, and the LLM4FPM framework, incorporating eCPG-Slicer and FARF algorithm, demonstrates significant improvements in accuracy and efficiency in identifying false positives.
This paper introduces MdEval, a new benchmark for evaluating the code debugging capabilities of large language models across 18 programming languages, addressing the limitations of existing benchmarks that primarily focus on Python.
This paper introduces a novel fuzzing technique using metamorphic testing to effectively detect logic bugs in zero-knowledge circuit processing pipelines, as demonstrated by the open-source tool Circuzz.