Interactive Groupwise Comparison for Reinforcement Learning from Human Feedback
| dc.contributor | Aalto-yliopisto | fi |
| dc.contributor | Aalto University | en |
| dc.contributor.author | Kompatscher, Jan | |
| dc.contributor.author | Shi, Danqing | |
| dc.contributor.author | Varni, Giovanna | |
| dc.contributor.author | Weinkauf, Tino | |
| dc.contributor.author | Oulasvirta, Antti | |
| dc.contributor.department | Department of Information and Communications Engineering | en |
| dc.contributor.department | ELLIS Institute | en |
| dc.contributor.groupauthor | User Interfaces | en |
| dc.contributor.groupauthor | Helsinki Institute for Information Technology (HIIT) | en |
| dc.contributor.organization | Department of Information and Communications Engineering | |
| dc.contributor.organization | BEC-INFM | |
| dc.contributor.organization | KTH Royal Institute of Technology | |
| dc.date.accessioned | 2025-12-02T07:40:13Z | |
| dc.date.available | 2025-12-02T07:40:13Z | |
| dc.date.issued | 2025 | |
| dc.description | Publisher Copyright: © 2025 The Author(s). Computer Graphics Forum published by Eurographics - The European Association for Computer Graphics and John Wiley & Sons Ltd. | openaire: EC/HE/101141916/EU//Artificial User | |
| dc.description.abstract | Reinforcement learning from human feedback (RLHF) has emerged as a key enabling technology for aligning AI behaviour with human preferences. The traditional way to collect data in RLHF is via pairwise comparisons: human raters are asked to indicate which one of two samples they prefer. We present an interactive visualisation that better exploits the human visual ability to compare and explore whole groups of samples. The interface is comprised of two linked views: 1) an exploration view showing a contextual overview of all sampled behaviours organised in a hierarchical clustering structure; and 2) a comparison view displaying two selected groups of behaviours for user queries. Users can efficiently explore large sets of behaviours by iterating between these two views. Additionally, we devised an active learning approach suggesting groups for comparison. As shown by our evaluation in six simulated robotics tasks, our approach increases the final rewards by 69.34%. It leads to lower error rates and better policies. We open-source the code that can be easily integrated into the RLHF training loop, supporting research on human–AI alignment. | en |
| dc.description.version | Peer reviewed | en |
| dc.format.mimetype | application/pdf | |
| dc.identifier.citation | Kompatscher, J, Shi, D, Varni, G, Weinkauf, T & Oulasvirta, A 2025, 'Interactive Groupwise Comparison for Reinforcement Learning from Human Feedback', Computer Graphics Forum. https://doi.org/10.1111/cgf.70290 | en |
| dc.identifier.doi | 10.1111/cgf.70290 | |
| dc.identifier.issn | 0167-7055 | |
| dc.identifier.issn | 1467-8659 | |
| dc.identifier.other | PURE UUID: bfe5bd78-aa32-49ef-bfc8-7ebedd3943bc | |
| dc.identifier.other | PURE ITEMURL: https://research.aalto.fi/en/publications/bfe5bd78-aa32-49ef-bfc8-7ebedd3943bc | |
| dc.identifier.other | PURE FILEURL: https://research.aalto.fi/files/201642771/Interactive_Groupwise_Comparison_for_Reinforcement_Learning_from_Human.pdf | |
| dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/140813 | |
| dc.identifier.urn | URN:NBN:fi:aalto-202512028958 | |
| dc.language.iso | en | en |
| dc.publisher | Wiley | |
| dc.relation | info:eu-repo/grantAgreement/EC/HE/101141916/EU//Artificial User | |
| dc.relation.fundinginfo | J.K., S.D., and A.O. received support from the Research Council of Finland (FCAI: 328400, 345604, 341763; Subjective Functions: 357578) and the ERC (AdG project Artificial User: 101141916). TW was supported by the Swedish e‐Science Research Centre, or SeRC. AO acknowledges the research environment provided by ELLIS Institute Finland. | |
| dc.relation.ispartofseries | Computer Graphics Forum | en |
| dc.rights | openAccess | en |
| dc.rights | CC BY-NC | |
| dc.rights.uri | https://creativecommons.org/licenses/by-nc/4.0/ | |
| dc.subject.keyword | human–computer interfaces,visualisation | |
| dc.subject.keyword | interaction | |
| dc.subject.keyword | visual analytics | |
| dc.title | Interactive Groupwise Comparison for Reinforcement Learning from Human Feedback | en |
| dc.type | A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä | fi |
| dc.type.version | publishedVersion |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Interactive_Groupwise_Comparison_for_Reinforcement_Learning_from_Human.pdf
- Size:
- 1.42 MB
- Format:
- Adobe Portable Document Format