Robot learning across agents: from imitation to multi-agent cooperation

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Electrical Engineering | Doctoral thesis (article-based) | Defence date: 2025-12-18

Date

Major/Subject

Mcode

Degree programme

Language

en

Pages

61 + app. 101

Series

Aalto University publication series Doctoral Theses, 259/2025

Abstract

As robotic systems become increasingly prevalent in real-world applications, developing effective learning algorithms that can handle both single-agent skill acquisition and team coordination becomes critical. This thesis addresses fundamental challenges in robot learning across scales, from single-agent imitation learning to multi-agent coordination. We first study the physically inconsistent motion problem in humanoid robot imitation learning, where human demonstrations cannot be directly executed by robots due to physical constraints. Our bilevel motion imitation framework jointly optimizes robot policies and reference trajectories, enabling humanoid robots to learn complex behaviors like jumping and kicking from human motion capture data. Transitioning to multi-agent systems, we address several fundamental challenges that emerge when scaling reinforcement learning to multiple agents. First, we propose OptiMAPPO, which incorporates optimistic updates into multi-agent policy gradient methods to overcome the relative overgeneralization problem where agents converge to suboptimal joint policies. Second, we develop Backpropagation Through Agents (BPTA), enabling bidirectional information flow in auto-regressive multi-agent learning to improve coordination through gradient propagation across agent sequences. To address the observation mismatch between centralized training and decentralized execution, we introduce AgentMixer, a framework that enables correlated policy factorization while maintaining fully decentralized execution capabilities. This approach resolves asymmetric learning failures through our Individual-Global-Consistency mechanism. For sparse-reward environments, we propose a learning progress driven curriculum that optimizes for actual policy improvement rather than task performance, demonstrating superior results compared to traditional reward-based curricula. Finally, we bridge single-agent and multi-agent reinforcement learning through an analysis of observability settings and investigate how local observations can be purposefully exploited to enhance the robustness of robot learning. This dissertation not only advances specific algorithmic techniques but also provides new perspectives on designing learning systems that can scale from individual robots to coordinated teams, paving the way for more capable and robust autonomous systems in real-world applications.

Description

Supervising professor

Pajarinen, Joni, Prof., Aalto University, Department of Electrical Engineering and Automation, Finland

Other note

Parts

  • [Publication 1]: Wenshuai Zhao, Yi Zhao, Joni Pajarinen and Michael Muehlebach. Bi-level motion imitation for humanoid robots. In Proceedings of the 8th Conference on Robot Learning (CoRL), Munich, Germany, November 2024.
  • [Publication 2]: Wenshuai Zhao, Yi Zhao, Zhiyuan Li, Juho Kannala and Joni Pajarinen. Optimistic multi-agent policy gradient. In Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, July 2024.
  • [Publication 3]: Zhiyuan Li, Wenshuai Zhao, Lijun Wu and Joni Pajarinen. Backpropagation through agents. In Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI), Vancouver, Canada, February 2024.
    DOI: 10.1609/aaai.v38i12.29277 View at publisher
  • [Publication 4]: Zhiyuan Li, Wenshuai Zhao, Lijun Wu and Joni Pajarinen. AgentMixer: multi-agent correlated policy factorization. In Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI), Philadelphia, USA, February 2025.
    DOI: 10.1609/aaai.v39i17.34048 View at publisher
  • [Publication 5]: Wenshuai Zhao, Zhiyuan Li and Joni Pajarinen. Learning progress driven multi-agent curriculum. In Proceedings of the 42nd International Conference on Machine Learning (ICML), Vancouver, Canada, July 2025.
  • [Publication 6]: Wenshuai Zhao, Eetu-Aleksi Rantala, Sahar Salimpour, Zhiyuan Li, Joni Pajarinen and Jorge Pena Queralta. Exploiting local observations for robust robot learning. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence, November 2025.

Citation