The paper proposes AID, an automated test case generation method designed for detecting tricky bugs in plausible programs. AID combines LLMs and differential testing to generate both test inputs and test oracles effectively.
The key components of AID are:
PUT-guided program generation: AID uses the program under test (PUT) and the specification to guide the LLM in generating program variants, ensuring the correctness of the generated variants.
Generator-based input generation: AID uses the LLM to generate a test input generator based on the input constraints, rather than directly generating test inputs. This approach mitigates the limitations in LLM's reasoning and computational capabilities.
Diversity-first differential testing: AID prioritizes the diversity of test outputs when determining test oracles, rather than the commonly used majority voting principle. This approach is more effective in identifying defects that are shared between the PUT and the generated program variants.
The evaluation results show that AID outperforms the state-of-the-art methods by up to 1.80x in recall, 2.65x in precision, and 1.66x in F1 score on two large-scale datasets containing human-written and AI-generated plausible programs.
To Another Language
from source content
arxiv.org
Głębsze pytania