The paper introduces a new dataset and framework for computer user interface (UI) understanding. It addresses the lack of focus on complete computer UIs in previous works, emphasizing the complexity and variability of computer interfaces compared to web and mobile applications. The dataset includes videos capturing user actions on a computer screen, aiming to automate workflow processes by understanding the state of computation from images. The proposed framework, UI Multi-task Contrastive Learning (UIMTCon), combines synthetic sample generation with contrastive learning to classify images in videos accurately. Experimental results show improved performance over baseline methods in fine-grain UI classification.
Naar een andere taal
vanuit de broninhoud
arxiv.org
Belangrijkste Inzichten Gedestilleerd Uit
by Andr... om arxiv.org 03-18-2024
https://arxiv.org/pdf/2403.10170.pdfDiepere vragen