toplogo
Sign In

Automated Generation of Test Scenarios from Natural Language Requirements using Retrieval-Augmented Large Language Models: An Industrial Evaluation


Core Concepts
This paper presents an automated approach (RAGTAG) for generating test scenarios from natural language requirements using Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs). The approach leverages the integration of domain knowledge with LLMs' generation capabilities to produce accurate and relevant test scenarios.
Abstract

The paper presents an industry-innovation study conducted in collaboration with the Austrian Post Group IT. It evaluates the RAGTAG approach on two industrial projects (ProjA and ProjB) from Austrian Post that have bilingual requirements in German and English.

The key highlights and insights from the study are:

  1. Prompt generation: The approach uses a prompt template that includes the natural language requirements, an optional example test scenario, and a brief description of the test scenario to be generated.

  2. Context retrieval: The approach leverages the Retrieval-Augmented Generation (RAG) pipeline to retrieve relevant context passages from a domain documentation corpus to augment the LLM's generation process.

  3. LLM-based test scenario generation: The approach uses either GPT-3.5 or GPT-4.0 LLMs to generate the test scenarios based on the prompt and the retrieved context.

  4. Evaluation: The study compares eight different configurations of the RAGTAG approach and finds that the GPT-3.5 LLM with few-shot prompting and the top retrieved context passage performs the best.

  5. Expert feedback: The study conducts an interview survey with four experts from Austrian Post to assess the usefulness of the generated test scenarios. The experts find the test scenarios to be largely relevant, comprehensive, coherent, and feasible, with some issues related to correctness of the steps.

  6. Challenges and future directions: The study highlights challenges related to maintaining domain-specific terminology, leveraging broader system information, and incorporating human expertise. It also discusses potential future directions, such as using LLMs for mapping related test scenarios and integrating RAGTAG into real-world projects.

Overall, the study demonstrates the potential of the RAGTAG approach in automating test scenario generation from natural language requirements, while also identifying areas for improvement to enhance the accuracy and usefulness of the generated test scenarios.

edit_icon

Customize Summary

edit_icon

Rewrite with AI

edit_icon

Generate Citations

translate_icon

Translate Source

visual_icon

Generate MindMap

visit_icon

Visit Source

Stats
"Test scenarios are crucial for systematically testing an application under various conditions, including edge cases, to identify potential issues and guarantee overall performance and reliability." "Manually specifying test scenarios is tedious and requires a deep understanding of software functionality and the underlying domain." "The generated scenarios are easily understandable to experts and feasible for testing in the project environment. The overall correctness is deemed satisfactory; however, gaps in capturing exact action sequences and domain nuances remain, underscoring the need for domain expertise when applying LLMs."
Quotes
"It seems like a useful tool, and much better than writing test scenarios manually. They require minor adjustments but that is much easier than writing it from scratch." "As we noted, some of the scenarios were completely off. Such cases are inevitable but are very easy to write off by just looking at them. Hence, the technology is worth it." "Can we use this for our actual project next week?"

Deeper Inquiries

How can the RAGTAG approach be further enhanced to better capture domain-specific terminology and nuances in the generated test scenarios?

To better capture domain-specific terminology and nuances in the generated test scenarios, the RAGTAG approach can be enhanced in several ways: Custom Glossary Integration: Implement a feature that allows users to input a custom glossary of domain-specific terms and their definitions. This would ensure that the large language models (LLMs) have access to the specific vocabulary used in the domain, improving the accuracy of the generated test scenarios. Fine-tuning with Domain Data: Fine-tune the LLMs on domain-specific data related to the project. By training the models on relevant documents, requirements, and test scenarios from the specific domain, the LLMs can better understand and generate contextually appropriate test scenarios. Feedback Loop Mechanism: Implement a feedback loop mechanism where experts can review and provide feedback on the generated test scenarios. This feedback can be used to iteratively improve the LLMs' understanding of the domain and refine the generated scenarios over time. Hybrid Approach: Combine the strengths of LLMs with rule-based systems or domain-specific algorithms. By integrating rule-based systems that understand domain-specific structures and constraints, the generated test scenarios can be tailored more accurately to the project requirements. Contextual Prompting: Develop more sophisticated prompting techniques that provide additional context to the LLMs. By providing more context in the prompts, the LLMs can better understand the domain-specific requirements and generate more precise test scenarios.

How can the RAGTAG approach be integrated with broader system architecture information to improve the relevance and accuracy of the generated test scenarios?

Integrating the RAGTAG approach with broader system architecture information can significantly enhance the relevance and accuracy of the generated test scenarios. Here are some strategies to achieve this integration: Access to System Documentation: Provide the LLMs with access to detailed system documentation, including architecture diagrams, data flow charts, and component interactions. This information can help the LLMs understand the underlying system architecture and generate test scenarios that align with the system's structure. Semantic Understanding: Enhance the LLMs' semantic understanding capabilities by training them on system architecture-related texts and documents. By exposing the models to information about system components, interfaces, and dependencies, they can generate test scenarios that reflect the system's architecture accurately. Contextual Embeddings: Use contextual embeddings to represent system architecture concepts in the prompts given to the LLMs. By embedding system-specific terms and relationships in the prompts, the LLMs can generate test scenarios that consider the broader system architecture context. Collaboration with System Architects: Involve system architects and domain experts in the test scenario generation process. By collaborating with experts who have in-depth knowledge of the system architecture, the generated test scenarios can be validated for accuracy and relevance to the system design. Iterative Refinement: Implement an iterative refinement process where the generated test scenarios are reviewed by system architects and refined based on their feedback. This iterative approach ensures that the test scenarios align closely with the system architecture and requirements. By incorporating system architecture information into the RAGTAG approach, the generated test scenarios can better reflect the intricacies of the system design and contribute to more effective testing processes.
0
star