แนวคิดหลัก
Combining machine learning and retrieval-based large language models to generate cost-effective UI automation tests for industrial mobile apps, demonstrated through a case study on the WeChat app.
บทคัดย่อ
The paper presents CAT, a novel approach that integrates machine learning and large language models (LLMs) to generate cost-effective UI automation tests for industrial mobile apps. The key highlights are:
-
Task Description Decomposition Phase:
- Leverages Retrieval Augmented Generation (RAG) to retrieve relevant examples from previous app testing datasets, providing a few-shot learning context to assist LLMs in understanding app usage and generating potential action steps.
- The LLMs-based action generation is guided by the retrieved examples, enabling the approach to handle the knowledge gap regarding industrial app specifics.
-
UI Automation Execution Phase:
- Employs machine learning techniques as the primary method for mapping the target UI elements to the dynamic UI screen.
- Uses LLMs as a complementary optimizer to address occasional mismatches in UI element mapping, improving the overall robustness.
-
Evaluation and Real-World Integration:
- Extensive experiments on the WeChat dataset (39k tasks) demonstrate the effectiveness of CAT, achieving a 90% completion rate at an average cost of $0.34 per test, outperforming state-of-the-art approaches.
- The integration of CAT into the real-world WeChat testing platform has led to the automatic detection of 141 bugs, reducing the developers' burden in bug detection and fixing.
The key innovation of CAT lies in its hybrid approach that combines the strengths of machine learning and LLMs to address the practical challenges of cost optimization and knowledge integration for industrial-level app testing, as demonstrated through the WeChat case study.
สถิติ
"UI automation tests play a crucial role in ensuring the quality of mobile applications."
"Despite the growing popularity of machine learning techniques to generate these tests, they still face several challenges, such as the mismatch of UI elements."
"CAT employs Retrieval Augmented Generation (RAG) to source examples of industrial app usage as the few-shot learning context, assisting LLMs in generating the specific sequence of actions."
"Our evaluations on the WeChat testing dataset demonstrate the CAT's performance and cost-effectiveness, achieving 90% UI automation with $0.34 cost, outperforming the state-of-the-art."
"During the testing period from December 2023 to June 2024, CAT automatically executes 6k of UI automation tests, detecting 141 bugs."
คำพูด
"CAT employs Retrieval Augmented Generation (RAG) to source examples of industrial app usage as the few-shot learning context, assisting LLMs in generating the specific sequence of actions."
"Our evaluations on the WeChat testing dataset demonstrate the CAT's performance and cost-effectiveness, achieving 90% UI automation with $0.34 cost, outperforming the state-of-the-art."