A probabilistic interpretation of self-paced learning with applications to reinforcement learning

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorKlink, Pascal
dc.contributor.authorAbdulsamad, Hany
dc.contributor.authorBelousov, Boris
dc.contributor.authorD'Eramo, Carlo
dc.contributor.authorPeters, Jan
dc.contributor.authorPajarinen, Joni
dc.contributor.departmentTechnische Universität Darmstadt
dc.contributor.departmentDepartment of Electrical Engineering and Automation
dc.date.accessioned2021-09-02T08:45:15Z
dc.date.available2021-09-02T08:45:15Z
dc.date.issued2021-07-01
dc.descriptionFunding Information: This project has received funding from the DFG project PA3179/1-1 (ROBOLEAP) and from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 640554 (SKILLS4ROBOTS). Calculations for this research were conducted on the Lichtenberg high performance computer of the TU Darmstadt. Publisher Copyright: © 2021 Pascal Klink, Hany Abdulsamad, Boris Belousov, Carlo D'Eramo, Jan Peters, Joni Pajarinen.
dc.description.abstractAcross machine learning, the use of curricula has shown strong empirical potential to improve learning from data by avoiding local optima of training objectives. For reinforcement learning (RL), curricula are especially interesting, as the underlying optimization has a strong tendency to get stuck in local optima due to the exploration-exploitation trade-off. Recently, a number of approaches for an automatic generation of curricula for RL have been shown to increase performance while requiring less expert knowledge compared to manually designed curricula. However, these approaches are seldomly investigated from a theoretical perspective, preventing a deeper understanding of their mechanics. In this paper, we present an approach for automated curriculum generation in RL with a clear theoretical underpinning. More precisely, we formalize the well-known self-paced learning paradigm as inducing a distribution over training tasks, which trades off between task complexity and the objective to match a desired task distribution. Experiments show that training on this induced distribution helps to avoid poor local optima across RL algorithms in different tasks with uninformative rewards and challenging exploration requirements.en
dc.description.versionPeer revieweden
dc.format.mimetypeapplication/pdf
dc.identifier.citationKlink , P , Abdulsamad , H , Belousov , B , D'Eramo , C , Peters , J & Pajarinen , J 2021 , ' A probabilistic interpretation of self-paced learning with applications to reinforcement learning ' , Journal of Machine Learning Research , vol. 22 . < https://www.jmlr.org/papers/volume22/21-0112/21-0112.pdf >en
dc.identifier.issn1532-4435
dc.identifier.issn1533-7928
dc.identifier.otherPURE UUID: 31b4968b-1011-4705-b355-e053b33d6656
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/31b4968b-1011-4705-b355-e053b33d6656
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85112417553&partnerID=8YFLogxK
dc.identifier.otherPURE LINK: https://www.jmlr.org/papers/volume22/21-0112/21-0112.pdf
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/66839298/ELEC_Klink_etal_A_Probabilistic_Interpretation_of_Self_Paced_Learning_JMLR_2021_finalpublishedversiob.pdf
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/109577
dc.identifier.urnURN:NBN:fi:aalto-202109028809
dc.language.isoenen
dc.publisherMICROTOME PUBL
dc.relation.ispartofseriesJournal of Machine Learning Researchen
dc.relation.ispartofseriesVolume 22en
dc.rightsopenAccessen
dc.subject.keywordCurriculum learning
dc.subject.keywordReinforcement learning
dc.subject.keywordRl-as-inference
dc.subject.keywordSelf-paced learning
dc.subject.keywordTempered inference
dc.titleA probabilistic interpretation of self-paced learning with applications to reinforcement learningen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionpublishedVersion
Files