ILuvUI: Instruction-tuned LangUage-Vision modeling of UIs from Machine Conversations
Loading...
Access rights
openAccess
CC BY
CC BY
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Date
Major/Subject
Mcode
Degree programme
Language
en
Pages
17
Series
IUI 2025 - Proceedings of the 2025 International Conference on Intelligent User Interfaces, pp. 861-877, International Conference on Intelligent User Interfaces, Proceedings IUI
Abstract
Multimodal Vision-Language Models (VLMs) enable powerful applications from their fused understanding of images and language, but many perform poorly on UI tasks due to the lack of UI training data. In this paper, we adapt a recipe for generating paired text-image training data for VLMs to the UI domain by combining existing pixel-based methods with a Large Language Model (LLM). Unlike prior art, our method requires no human-provided annotations, and it can be applied to any dataset of UI screenshots. We generate a dataset of 353K conversational examples paired with UIs that cover Q&A, UI descriptions, and planning, and use it to fine-tune a conversational VLM for UI tasks. To assess the performance of our model, we benchmark it on UI element detection tasks, evaluate response quality, and showcase its applicability to UI verification.Description
Publisher Copyright: © 2025 Copyright held by the owner/author(s).
Keywords
Other note
Citation
Jiang, Y, Schoop, E, Swearngin, A & Nichols, J 2025, ILuvUI: Instruction-tuned LangUage-Vision modeling of UIs from Machine Conversations. in IUI 2025 - Proceedings of the 2025 International Conference on Intelligent User Interfaces. International Conference on Intelligent User Interfaces, Proceedings IUI, ACM, pp. 861-877, International Conference on Intelligent User Interfaces, Cagliari, Italy, 24/03/2025. https://doi.org/10.1145/3708359.3712129