Describing UI Screenshots in Natural Language
Loading...
Access rights
openAccess
URL
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
2022-11-09
Major/Subject
Mcode
Degree programme
Language
en
Pages
28
Series
ACM Transactions on Intelligent Systems and Technology, Volume 14, issue 1
Abstract
Being able to describe any user interface (UI) screenshot in natural language can promote understanding of the main purpose of the UI, yet currently it cannot be accomplished with state-of-the-art captioning systems. We introduce XUI, a novel method inspired by the global precedence effect to create informative descriptions of UIs, starting with an overview and then providing fine-grained descriptions about the most salient elements. XUI builds upon computational models for topic classification, visual saliency prediction, and natural language generation (NLG). XUI provides descriptions with up to three different granularity levels that, together, describe what is in the interface and what the user can do with it. We found that XUI descriptions are highly readable, are perceived to accurately describe the UI, and score similarly to human-generated UI descriptions. XUI is available as open-source software.Description
Funding Information: We acknowledge the computational resources provided by the Aalto Science-IT project. We thank Homayun Afrabandpey, Daniel Buschek, Jussi Jokinen, and Jörg Tiedemann for reviewing an earlier draft of this article. This work has been supported by the Horizon 2020 FET program of the European Union through the ERA-NET Cofund funding (grant CHIST-ERA-20-BCI-001), the European Innovation Council Pathfinder program (SYMBIOTIK project), and the Academy of Finland (grants 291556, 318559, 310947). Publisher Copyright: © 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Keywords
Captioning, deep learning, natural language processing, visual saliency
Other note
Citation
Leiva, L A, Hota, A & Oulasvirta, A 2022, ' Describing UI Screenshots in Natural Language ', ACM Transactions on Intelligent Systems and Technology, vol. 14, no. 1, 19 . https://doi.org/10.1145/3564702