Describing UI Screenshots in Natural Language

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorLeiva, Luis A.en_US
dc.contributor.authorHota, Asutoshen_US
dc.contributor.authorOulasvirta, Anttien_US
dc.contributor.departmentDepartment of Communications and Networkingen
dc.contributor.departmentDepartment of Information and Communications Engineeringen
dc.contributor.groupauthorHelsinki Institute for Information Technology (HIIT)en
dc.contributor.groupauthorUser Interfacesen
dc.contributor.organizationDepartment of Communications and Networkingen_US
dc.date.accessioned2023-04-05T06:18:13Z
dc.date.available2023-04-05T06:18:13Z
dc.date.issued2022-11-09en_US
dc.descriptionFunding Information: We acknowledge the computational resources provided by the Aalto Science-IT project. We thank Homayun Afrabandpey, Daniel Buschek, Jussi Jokinen, and Jörg Tiedemann for reviewing an earlier draft of this article. This work has been supported by the Horizon 2020 FET program of the European Union through the ERA-NET Cofund funding (grant CHIST-ERA-20-BCI-001), the European Innovation Council Pathfinder program (SYMBIOTIK project), and the Academy of Finland (grants 291556, 318559, 310947). Publisher Copyright: © 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
dc.description.abstractBeing able to describe any user interface (UI) screenshot in natural language can promote understanding of the main purpose of the UI, yet currently it cannot be accomplished with state-of-the-art captioning systems. We introduce XUI, a novel method inspired by the global precedence effect to create informative descriptions of UIs, starting with an overview and then providing fine-grained descriptions about the most salient elements. XUI builds upon computational models for topic classification, visual saliency prediction, and natural language generation (NLG). XUI provides descriptions with up to three different granularity levels that, together, describe what is in the interface and what the user can do with it. We found that XUI descriptions are highly readable, are perceived to accurately describe the UI, and score similarly to human-generated UI descriptions. XUI is available as open-source software.en
dc.description.versionPeer revieweden
dc.format.extent28
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationLeiva, L A, Hota, A & Oulasvirta, A 2022, 'Describing UI Screenshots in Natural Language', ACM Transactions on Intelligent Systems and Technology, vol. 14, no. 1, 19. https://doi.org/10.1145/3564702en
dc.identifier.doi10.1145/3564702en_US
dc.identifier.issn2157-6904
dc.identifier.issn2157-6912
dc.identifier.otherPURE UUID: 092efd3a-e7c7-4029-9aff-850168c9d67den_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/092efd3a-e7c7-4029-9aff-850168c9d67den_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/104922183/XUI.pdf
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/120347
dc.identifier.urnURN:NBN:fi:aalto-202304052665
dc.language.isoenen
dc.publisherACM
dc.relation.fundinginfoWe acknowledge the computational resources provided by the Aalto Science-IT project. We thank Homayun Afrabandpey, Daniel Buschek, Jussi Jokinen, and Jörg Tiedemann for reviewing an earlier draft of this article. This work has been supported by the Horizon 2020 FET program of the European Union through the ERA-NET Cofund funding (grant CHIST-ERA-20-BCI-001), the European Innovation Council Pathfinder program (SYMBIOTIK project), and the Academy of Finland (grants 291556, 318559, 310947).
dc.relation.ispartofseriesACM Transactions on Intelligent Systems and Technologyen
dc.relation.ispartofseriesVolume 14, issue 1en
dc.rightsopenAccessen
dc.subject.keywordCaptioningen_US
dc.subject.keyworddeep learningen_US
dc.subject.keywordnatural language processingen_US
dc.subject.keywordvisual saliencyen_US
dc.titleDescribing UI Screenshots in Natural Languageen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionacceptedVersion

Files