Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorÇelikok, Mustafa Merten_US
dc.contributor.authorOliehoek, Frans A.en_US
dc.contributor.authorKaski, Samuelen_US
dc.contributor.departmentProbabilistic Machine Learningen_US
dc.contributor.departmentDelft University of Technologyen_US
dc.contributor.departmentComputer Science Professorsen_US
dc.contributor.departmentDepartment of Computer Scienceen
dc.date.accessioned2022-08-24T07:46:09Z
dc.date.available2022-08-24T07:46:09Z
dc.date.issued2022en_US
dc.descriptionFunding Information: This work was supported by: the Academy of Finland (Flagship programme: Finnish Center for Artificial Intelligence; decision 828400), the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 758824 -INFLUENCE), the UKRI Turing AI World-Leading Researcher Fellowship EP/W002973/1, ELISE travel grant (GA no 951847), KAUTE Foundation, and the Aalto Science-IT Project. Funding Information: This work was supported by: the Academy of Finland (Flagship programme: Finnish Center for Artificial Intelligence; decision 828400), the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 758824 —INFLUENCE), the UKRI Turing AI World-Leading Researcher Fellowship EP/W002973/1, ELISE travel grant (GA no 951847), KAUTE Foundation, and the Aalto Science-IT Project. Publisher Copyright: © 2022 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved
dc.description.abstractCentaurs are half-human, half-AI decision-makers where the AI's goal is to complement the human. To do so, the AI must be able to recognize the goals and constraints of the human and have the means to help them. We present a novel formulation of the interaction between the human and the AI as a sequential game where the agents are modelled using Bayesian best-response models. We show that in this case the AI's problem of helping bounded-rational humans make better decisions reduces to a Bayes-adaptive POMDP. In our simulated experiments, we consider an instantiation of our framework for humans who are subjectively optimistic about the AI's future behaviour. Our results show that when equipped with a model of the human, the AI can infer the human's bounds and nudge them towards better decisions. We discuss ways in which the machine can learn to improve upon its own limitations as well with the help of the human. We identify a novel trade-off for centaurs in partially observable tasks: for the AI's actions to be acceptable to the human, the machine must make sure their beliefs are sufficiently aligned, but aligning beliefs might be costly. We present a preliminary theoretical analysis of this trade-off and its dependence on task structure.en
dc.description.versionPeer revieweden
dc.format.extent9
dc.format.extent235-243
dc.identifier.citationÇelikok , M M , Oliehoek , F A & Kaski , S 2022 , Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs . in International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022 . Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS , vol. 1 , International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS) , pp. 235-243 , International Conference on Autonomous Agents and Multiagent Systems , Auckland , New Zealand , 09/05/2022 . < https://arxiv.org/abs/2204.01160 >en
dc.identifier.isbn9781713854333
dc.identifier.issn1548-8403
dc.identifier.issn1558-2914
dc.identifier.otherPURE UUID: 65630886-48f8-4299-80ff-4c781102b800en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/65630886-48f8-4299-80ff-4c781102b800en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85134303565&partnerID=8YFLogxKen_US
dc.identifier.otherPURE LINK: https://www.ifaamas.org/Proceedings/aamas2022/forms/index.htmen_US
dc.identifier.otherPURE LINK: https://arxiv.org/abs/2204.01160en_US
dc.identifier.otherPURE LINK: https://dl.acm.org/doi/abs/10.5555/3535850.3535878en_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/116176
dc.identifier.urnURN:NBN:fi:aalto-202208244991
dc.language.isoenen
dc.publisherIFAAMAS
dc.relation.ispartofInternational Conference on Autonomous Agents and Multiagent Systemsen
dc.relation.ispartofseriesInternational Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022en
dc.relation.ispartofseriesProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMASen
dc.relation.ispartofseriesVolume 1en
dc.rightsopenAccessen
dc.subject.keywordBayesian Reinforcement Learningen_US
dc.subject.keywordComputational Rationalityen_US
dc.subject.keywordHybrid Intelligenceen_US
dc.subject.keywordMultiagent Learningen_US
dc.titleBest-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaursen
dc.typeConference article in proceedingsfi

Files