Core Concepts
ChatGPT, a prominent generative large language model, exhibits both promising and challenging aspects in generating UML sequence diagrams from natural language requirements, with issues related to completeness, correctness, and the need for domain-specific knowledge.
Abstract
The study explores the capability of ChatGPT, a large language model, to generate UML sequence diagrams from natural language requirements. The researchers collected 28 industrial requirements documents covering diverse application domains and requirements formats, including "shall" requirements, user stories, and use case specifications. They also introduced variants of the requirements to simulate realistic scenarios, such as the presence of requirements smells or the evolution of requirements.
The researchers evaluated the generated diagrams according to five criteria: completeness, correctness, adherence to the standard, degree of understandability, and terminological alignment. The results indicate that ChatGPT generally performs well in terms of understandability, standard compliance, and terminological alignment, but exhibits significant issues related to completeness and correctness. These issues become more pronounced in the presence of low-quality requirements that include ambiguities or inconsistencies, and when technical/contextual knowledge is needed to correctly interpret the requirements.
The study also identified a range of specific issues, including:
Summarization issues, where relevant information is missing from the generated diagrams
Incorrect interactions, components, or structure in the diagrams
Semantic errors and the introduction of additional, irrelevant terms
Traceability challenges and incoherence manifestations that hinder the understandability of the diagrams
Memory-induced hallucinations and the lack of contextual understanding that can affect the quality of the generated output
The insights from this study can inform the practical utilization of large language models in the requirements engineering process and inspire the development of RE-specific prompting strategies to address the identified challenges and improve the effectiveness of model generation.
Stats
The requirements for the train control system shall prioritize servicing the requested floor, moving upwards or downwards as needed, and open its doors upon arrival.
When the overload sensor detects excessive weight in the elevator cabin, the Elevator System shall prevent further entry, emit an audible alarm, and display an overload warning.
If track data at least to the location where the relevant movement authority ends are not available on-board, the movement authority shall be rejected.
Quotes
"If an ETCS equipped train passes a level transition to one or more levels for which it is not equipped, ETCS shall initiate a brake application."
"The current operational status shall be indicated to the driver on the DMI."