The paper explores using in-context learning (ICL) with large language models to evaluate whether mobile app designs satisfy consent requirements from the EU General Data Protection Regulation (GDPR).
The key highlights and insights are:
The authors manually extracted knowledge about 8 consent requirements from GDPR guidance documents and formalized them as logical propositions.
They generated 400 mobile app scenarios and associated design practices, with half designed to satisfy and half to dissatisfy the consent requirements.
They evaluated the ability of GPT-3.5 and GPT-4 language models to verify requirement satisfaction using different prompting strategies, including a requirements-specific template, a generic template, and chain-of-thought prompting.
The results show that GPT-4 can verify requirement satisfaction with 95.6% accuracy and dissatisfaction with 93.2% accuracy. Chain-of-thought prompting improved GPT-3.5 performance by 9.0%.
The authors discuss trade-offs among the prompting strategies, model performance variations across different requirements, and the impact of scenario and requirement polarity on accuracy.
They also identify issues in the generated design practices, including logical inconsistencies, definition biases, and conclusory statements, which can impact the reliability of the verification process.
Overall, the study demonstrates the potential of large language models to assist in requirements engineering tasks, while highlighting the need for careful prompt engineering and dataset curation to ensure robust and reliable performance.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問