Robot learning across agents: from imitation to multi-agent cooperation
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering |
Doctoral thesis (article-based)
| Defence date: 2025-12-18
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
Major/Subject
Mcode
Degree programme
Language
en
Pages
61 + app. 101
Series
Aalto University publication series Doctoral Theses, 259/2025
Abstract
As robotic systems become increasingly prevalent in real-world applications, developing effective learning algorithms that can handle both single-agent skill acquisition and team coordination becomes critical. This thesis addresses fundamental challenges in robot learning across scales, from single-agent imitation learning to multi-agent coordination. We first study the physically inconsistent motion problem in humanoid robot imitation learning, where human demonstrations cannot be directly executed by robots due to physical constraints. Our bilevel motion imitation framework jointly optimizes robot policies and reference trajectories, enabling humanoid robots to learn complex behaviors like jumping and kicking from human motion capture data. Transitioning to multi-agent systems, we address several fundamental challenges that emerge when scaling reinforcement learning to multiple agents. First, we propose OptiMAPPO, which incorporates optimistic updates into multi-agent policy gradient methods to overcome the relative overgeneralization problem where agents converge to suboptimal joint policies. Second, we develop Backpropagation Through Agents (BPTA), enabling bidirectional information flow in auto-regressive multi-agent learning to improve coordination through gradient propagation across agent sequences. To address the observation mismatch between centralized training and decentralized execution, we introduce AgentMixer, a framework that enables correlated policy factorization while maintaining fully decentralized execution capabilities. This approach resolves asymmetric learning failures through our Individual-Global-Consistency mechanism. For sparse-reward environments, we propose a learning progress driven curriculum that optimizes for actual policy improvement rather than task performance, demonstrating superior results compared to traditional reward-based curricula. Finally, we bridge single-agent and multi-agent reinforcement learning through an analysis of observability settings and investigate how local observations can be purposefully exploited to enhance the robustness of robot learning. This dissertation not only advances specific algorithmic techniques but also provides new perspectives on designing learning systems that can scale from individual robots to coordinated teams, paving the way for more capable and robust autonomous systems in real-world applications.Description
Supervising professor
Pajarinen, Joni, Prof., Aalto University, Department of Electrical Engineering and Automation, FinlandOther note
Parts
-
[Publication 1]: Wenshuai Zhao, Yi Zhao, Joni Pajarinen and Michael Muehlebach. Bi-level motion imitation for humanoid robots. In Proceedings of the 8th Conference on Robot Learning (CoRL), Munich, Germany, November 2024.
Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202503263036
-
[Publication 2]: Wenshuai Zhao, Yi Zhao, Zhiyuan Li, Juho Kannala and Joni Pajarinen. Optimistic multi-agent policy gradient. In Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, July 2024.
Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202409266521
-
[Publication 3]: Zhiyuan Li, Wenshuai Zhao, Lijun Wu and Joni Pajarinen. Backpropagation through agents. In Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI), Vancouver, Canada, February 2024.
DOI: 10.1609/aaai.v38i12.29277 View at publisher
-
[Publication 4]: Zhiyuan Li, Wenshuai Zhao, Lijun Wu and Joni Pajarinen. AgentMixer: multi-agent correlated policy factorization. In Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI), Philadelphia, USA, February 2025.
DOI: 10.1609/aaai.v39i17.34048 View at publisher
-
[Publication 5]: Wenshuai Zhao, Zhiyuan Li and Joni Pajarinen. Learning progress driven multi-agent curriculum. In Proceedings of the 42nd International Conference on Machine Learning (ICML), Vancouver, Canada, July 2025.
Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202512109016
- [Publication 6]: Wenshuai Zhao, Eetu-Aleksi Rantala, Sahar Salimpour, Zhiyuan Li, Joni Pajarinen and Jorge Pena Queralta. Exploiting local observations for robust robot learning. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence, November 2025.