Interactive Reward Tuning: Interactive Visualization for Preference Elicitation
No Thumbnail Available
Access rights
openAccess
acceptedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
2024
Major/Subject
Mcode
Degree programme
Language
en
Pages
8
Series
2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Proceedings of the International Conference on Intelligent Robots and Systems
Abstract
In reinforcement learning, tuning reward weights in the reward function is necessary to align behavior with user preferences. However, current approaches, which use pairwise comparisons for preference elicitation, are inefficient, because they miss much of the human ability to explore and judge groups of candidate solutions. The paper presents a novel visualization-based approach that better exploits the user’s ability to quickly recognize interesting directions for reward tuning. It breaks down the tuning problem by using the visual information-seeking principle: overview first, zoom and filter, then details-on-demand. Following this principle, we built a visualization system comprising two interactively linked views: 1) an embedding view showing a contextual overview of all sampled behaviors and 2) a sample view displaying selected behaviors and visualizations of the detailed time-series data. A user can efficiently explore large sets of samples by iterating between these two views. The paper demonstrates that the proposed approach is capable of tuning rewards for challenging behaviors. The simulation-based evaluation shows that the system can reach optimal solutions with fewer queries relative to baselines.Description
Keywords
Other note
Citation
Shi, D, Zhu, S, Weinkauf, T & Oulasvirta, A 2024, Interactive Reward Tuning: Interactive Visualization for Preference Elicitation . in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . Proceedings of the International Conference on Intelligent Robots and Systems, IEEE, IEEE/RSJ International Conference on Intelligent Robots and Systems, Abu Dhabi, United Arab Emirates, 14/10/2024 . https://doi.org/10.1109/IROS58592.2024.10801540