Citation:
Zhao , Y , Zhao , W , Boney , R , Kannala , J & Pajarinen , J 2023 , Simplified Temporal Consistency Reinforcement Learning . in A Krause , E Brunskill , K Cho , B Engelhardt , S Sabato & J Scarlett (eds) , Proceedings of the 40th International Conference on Machine Learning . Proceedings of Machine Learning Research , vol. 202 , JMLR , pp. 42227-42246 , International Conference on Machine Learning , Honolulu , Hawaii , United States , 23/07/2023 . < https://proceedings.mlr.press/v202/zhao23k.html >
|
Abstract:
Reinforcement learning (RL) is able to solve complex sequential decision-making tasks but is currently limited by sample efficiency and required computation. To improve sample efficiency, recent work focuses on model-based RL which interleaves model learning with planning. Recent methods further utilize policy learning, value estimation, and, self-supervised learning as auxiliary objectives. In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. This applies when using pure planning with a dynamics model conditioned on the representation, but, also when utilizing the representation as policy and value function features in model-free RL. In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1× faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the Deepmind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods’ sample efficiency while training 2.4× faster.
|