Simplified Temporal Consistency Reinforcement Learning

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorZhao, Yien_US
dc.contributor.authorZhao, Wenshuaien_US
dc.contributor.authorBoney, Rinuen_US
dc.contributor.authorKannala, Juhoen_US
dc.contributor.authorPajarinen, Jonien_US
dc.contributor.departmentRobot Learningen_US
dc.contributor.departmentDepartment of Computer Scienceen_US
dc.contributor.departmentComputer Science Professorsen_US
dc.contributor.departmentDepartment of Electrical Engineering and Automationen
dc.contributor.editorKrause, Andreaden_US
dc.contributor.editorBrunskill, Emmaen_US
dc.contributor.editorCho, Kyunghyunen_US
dc.contributor.editorEngelhardt, Barbaraen_US
dc.contributor.editorSabato, Sivanen_US
dc.contributor.editorScarlett, Jonathanen_US
dc.date.accessioned2023-09-13T06:47:35Z
dc.date.available2023-09-13T06:47:35Z
dc.date.issued2023-07en_US
dc.description.abstractReinforcement learning (RL) is able to solve complex sequential decision-making tasks but is currently limited by sample efficiency and required computation. To improve sample efficiency, recent work focuses on model-based RL which interleaves model learning with planning. Recent methods further utilize policy learning, value estimation, and, self-supervised learning as auxiliary objectives. In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. This applies when using pure planning with a dynamics model conditioned on the representation, but, also when utilizing the representation as policy and value function features in model-free RL. In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1× faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the Deepmind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods’ sample efficiency while training 2.4× faster.en
dc.description.versionPeer revieweden
dc.format.extent20
dc.format.extent42227-42246
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationZhao , Y , Zhao , W , Boney , R , Kannala , J & Pajarinen , J 2023 , Simplified Temporal Consistency Reinforcement Learning . in A Krause , E Brunskill , K Cho , B Engelhardt , S Sabato & J Scarlett (eds) , Proceedings of the 40th International Conference on Machine Learning . Proceedings of Machine Learning Research , vol. 202 , JMLR , pp. 42227-42246 , International Conference on Machine Learning , Honolulu , Hawaii , United States , 23/07/2023 . < https://proceedings.mlr.press/v202/zhao23k.html >en
dc.identifier.issn2640-3498
dc.identifier.otherPURE UUID: 8495be79-be5c-4e18-acee-b646e5c9e8aden_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/8495be79-be5c-4e18-acee-b646e5c9e8aden_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85174418779&partnerID=8YFLogxKen_US
dc.identifier.otherPURE LINK: https://proceedings.mlr.press/v202/zhao23k.htmlen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/120700029/SCI_Zhao_etal_ICML_2023.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/123482
dc.identifier.urnURN:NBN:fi:aalto-202309135842
dc.language.isoenen
dc.publisherPMLR
dc.relation.ispartofInternational Conference on Machine Learningen
dc.relation.ispartofseriesProceedings of the 40th International Conference on Machine Learningen
dc.relation.ispartofseriesProceedings of Machine Learning Researchen
dc.relation.ispartofseriesVolume 202en
dc.rightsopenAccessen
dc.titleSimplified Temporal Consistency Reinforcement Learningen
dc.typeConference article in proceedingsfi
dc.type.versionpublishedVersion
Files