Optimistic Multi-Agent Policy Gradient
No Thumbnail Available
Access rights
openAccess
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
2024
Major/Subject
Mcode
Degree programme
Language
en
Pages
17
Series
Proceedings of Machine Learning Research, Volume 235, pp. 61186-61202
Abstract
Relative overgeneralization (RO) occurs in cooperative multi-agent learning tasks when agents converge towards a suboptimal joint policy due to overfitting to suboptimal behaviors of other agents.No methods have been proposed for addressing RO in multi-agent policy gradient (MAPG) methods although these methods produce state-of-the-art results.To address this gap, we propose a general, yet simple, framework to enable optimistic updates in MAPG methods that alleviate the RO problem.Our approach involves clipping the advantage to eliminate negative values, thereby facilitating optimistic updates in MAPG.The optimism prevents individual agents from quickly converging to a local optimum.Additionally, we provide a formal analysis to show that the proposed method retains optimality at a fixed point.In extensive evaluations on a diverse set of tasks including the Multi-agent MuJoCo and Overcooked benchmarks, our method outperforms strong baselines on 13 out of 19 tested tasks and matches the performance on the rest.Description
Publisher Copyright: Copyright 2024 by the author(s)
Keywords
Other note
Citation
Zhao, W, Zhao, Y, Li, Z, Kannala, J & Pajarinen, J 2024, ' Optimistic Multi-Agent Policy Gradient ', Proceedings of Machine Learning Research, vol. 235, pp. 61186-61202 . < https://proceedings.mlr.press/v235/zhao24v.html >