The GUIDE (Graphical User Interface Data for Execution) dataset is designed to advance the capabilities of Robotic Process Automation (RPA) models by providing a rich and diverse dataset that combines visual, textual, and spatial information. The dataset includes data from various websites, including Apollo, Gmail, Calendar, and Canva, covering a wide range of user interactions and tasks.
The key highlights of the GUIDE dataset are:
Comprehensive data collection: Each data entry includes an image, a task description, the last action taken, a chain of thought (CoT), and the next action to be performed, along with grounding information on where the action needs to be executed.
Diverse website coverage: The dataset encompasses data from multiple websites, representing a realistic scope of web-based applications and services.
Hierarchical task categorization: Tasks are categorized into three levels of complexity (basic, intermediate, and complex) to facilitate targeted model training and evaluation.
Advanced annotation tool: The NEXTAG (Next Action Grounding and Annotation Tool) is an in-house tool developed to streamline the data annotation process, ensuring efficient and accurate capture of user interactions and spatial grounding.
Data augmentation: The dataset undergoes various augmentation techniques, such as simulating different browsers, operating systems, and visual themes, to enhance the model's robustness and adaptability to diverse GUI environments.
The GUIDE dataset aims to enable the development of multi-platform Large Language Models (LLMs) that can seamlessly predict and execute tasks within a GUI context, adding a layer of semantic understanding that surpasses the capabilities of traditional RPA tools. By leveraging this dataset, researchers and developers can advance the field of RPA, improving the efficiency and intelligence of automated systems.
다른 언어로
소스 콘텐츠 기반
arxiv.org
더 깊은 질문