Model-based Multi-agent Reinforcement Learning for AI Assistants

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Science | Doctoral thesis (article-based) | Defence date: 2023-06-05
Degree programme
71 + app. 85
Aalto University publication series DOCTORAL THESES, 51/2023
Interaction of humans and AI systems is becoming ubiquitous. Specifically, recent advances in machine learning have allowed AI agents to interactively learn from humans how to perform their tasks. The main focus of this line of research has been to develop AI systems that eventually learn to automate tasks for humans, where the end goal is to remove the human from the loop, even though humans are involved during training. However, this perspective limits the applications of AI systems to cases where full automation is the desired outcome. In this thesis, we focus on settings where an AI agent and a human must collaborate to perform a task, and the end goal of the AI is not to replace human intelligence, but to augment it. AI-assistance for humans involves at least two agents: an AI agent and a human. System designers have no control over the humans, and must develop learning agents that have the capabilities to assist and augment them. To do so, the AI agent must be able to infer the goals, bounds, constraints, and future behaviour of its human partner. In this thesis, we propose a model-based multi-agent reinforcement learning approach, where the AI agent infers a model of its human partner, and uses this model to behave in a way that is maximally helpful for the human.In order to learn a mathematical model of the human from interaction, the AI agent first must have a model space. Since data scarcity is a key problem in human--AI collaboration, defining a model space that is expressive enough to capture human behaviour, yet constrained enough to allow sample-efficient inference is important. Determining the minimal and realistic set of prior assumptions on human behaviour in order to define such model spaces is an open problem. To address this problem, we bring in prior knowledge from cognitive science and behavioural economics, where various mathematical models of human decision-making have been developed. However, incorporating this prior knowledge in multi-agent reinforcement learning is not trivial. We demonstrate that, using the methods developed in this thesis, sufficient statistics of human behaviour can be drawn from these models, and incorporated into multi-agent reinforcement learning. We demonstrate the effectiveness of our approach of incorporating models of human behaviour into multi-agent reinforcement learning in three types of tasks where: (I) The AI must learn the preferences of the human from their feedback to assist them, (II) The AI must teach the human conceptual knowledge to assist them, (III) The AI must infer the cognitive bounds and biases of the human to improve their decisions. In all tasks, our simulated empirical results show that the AI agent can learn to assist the human and improve the human--AI team's performance. Our user study for the case (I) supports the simulated results. We present a theoretical result for case (III) which determines the limits of AI-assistance when the human user disagrees with the AI.
Supervising professor
Kaski, Samuel, Prof., Aalto University, Department of Computer Science, Finland
Thesis advisor
Kaski, Samuel, Prof., Aalto University, Department of Computer Science, Finland
reinforcement learning, multi-agent learning, human--AI collaboration, probabilistic methods
Other note
  • [Publication 1]: Tomi Peltola, Mustafa Mert Çelikok, Pedram Daee, Samuel Kaski. Machine Teaching of Active Sequential Learners. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, p. 11202–11213, December 2019
  • [Publication 2]: Mustafa Mert Çelikok, Pierre-Alexandre Murena, Samuel Kaski. Teaching to Learn: Sequential Teaching of Learners with Internal States. In 37th AAAI Conference on Artificial Intelligence (to be printed), February 2023
  • [Publication 3]: Mustafa Mert Çelikok, Frans A. Oliehoek, Samuel Kaski. Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs. In 21st International Conference on Autonomous Agents and Multi-agent Systems, p. 235–243, May 2022.
    Full text in Acris/Aaltodoc:
  • [Publication 4]: Mustafa Mert Çelikok, Pierre-Alexandre Murena, Samuel Kaski. Modelling Needs User Modelling. Frontiers in Artificial Intelligence (to be printed), March 2023.
    DOI: 10.3389/frai.2023.1097891 View at publisher