Simplified Temporal Consistency Reinforcement Learning

dc.contributorAalto Universityen
dc.contributor.authorZhao, Yien_US
dc.contributor.authorZhao, Wenshuaien_US
dc.contributor.authorBoney, Rinuen_US
dc.contributor.authorKannala, Juhoen_US
dc.contributor.authorPajarinen, Jonien_US
dc.contributor.departmentRobot Learningen_US
dc.contributor.departmentDepartment of Computer Scienceen_US
dc.contributor.departmentComputer Science Professorsen_US
dc.contributor.departmentDepartment of Electrical Engineering and Automationen
dc.contributor.editorKrause, Andreaden_US
dc.contributor.editorBrunskill, Emmaen_US
dc.contributor.editorCho, Kyunghyunen_US
dc.contributor.editorEngelhardt, Barbaraen_US
dc.contributor.editorSabato, Sivanen_US
dc.contributor.editorScarlett, Jonathanen_US
dc.description.abstractReinforcement learning (RL) is able to solve complex sequential decision-making tasks but is currently limited by sample efficiency and required computation. To improve sample efficiency, recent work focuses on model-based RL which interleaves model learning with planning. Recent methods further utilize policy learning, value estimation, and, self-supervised learning as auxiliary objectives. In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. This applies when using pure planning with a dynamics model conditioned on the representation, but, also when utilizing the representation as policy and value function features in model-free RL. In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1× faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the Deepmind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods’ sample efficiency while training 2.4× faster.en
dc.description.versionPeer revieweden
dc.identifier.citationZhao , Y , Zhao , W , Boney , R , Kannala , J & Pajarinen , J 2023 , Simplified Temporal Consistency Reinforcement Learning . in A Krause , E Brunskill , K Cho , B Engelhardt , S Sabato & J Scarlett (eds) , Proceedings of the 40th International Conference on Machine Learning . Proceedings of Machine Learning Research , vol. 202 , JMLR , pp. 42227-42246 , International Conference on Machine Learning , Honolulu , Hawaii , United States , 23/07/2023 . < >en
dc.identifier.otherPURE UUID: 8495be79-be5c-4e18-acee-b646e5c9e8aden_US
dc.identifier.otherPURE ITEMURL:
dc.identifier.otherPURE LINK:
dc.identifier.otherPURE LINK:
dc.identifier.otherPURE FILEURL:
dc.relation.ispartofInternational Conference on Machine Learningen
dc.relation.ispartofseriesProceedings of the 40th International Conference on Machine Learningen
dc.relation.ispartofseriesProceedings of Machine Learning Researchen
dc.relation.ispartofseriesVolume 202en
dc.titleSimplified Temporal Consistency Reinforcement Learningen
dc.typeConference article in proceedingsfi