Sample-Efficient Methods for Real-World Deep Reinforcement Learning

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Science | Doctoral thesis (article-based) | Defence date: 2022-06-09
Degree programme
92 + app. 76
Aalto University publication series DOCTORAL THESES, 71/2022
Reinforcement learning (RL) is a general framework for learning and evaluating intelligent behaviors in any domain. Deep reinforcement learning combines RL with deep learning to learn expressive nonlinear functions that can interpret rich sensory signals to produce complex behaviors. However, this comes at the cost of increased sample complexity and instability, limiting the practical impact of deep RL algorithms on real-world problems. The thesis presents advances towards improving the sample efficiency and benchmarking of deep RL algorithms on real-world problems. This work develops sample-efficient deep RL algorithms for three different problem settings: multi-agent discrete control, continuous control, and continuous control from image observations. For multi-agent discrete control, the thesis proposes a sample-efficient model-based approach that plans using known dynamics models, to learn to play imperfect-information games with large state-action spaces. This is achieved by training a policy network from partial observations to imitate the actions of an oracle planner that has full observability. For continuous control, the thesis demonstrates that trajectory optimization with learned dynamics models could lead to the optimization procedure exploiting the inaccuracies of the model. The thesis proposes two regularization strategies to prevent this, based on uncertainty estimates from a denoising autoencoder or an energy-based model, to achieve rapid initial learning on a set of popular continuous control tasks. For continuous control problems with image observations, the thesis proposes an actor-critic method that learns feature point state representations, without any additional supervision, for improved sample efficiency. The thesis also introduces two low-cost robot learning benchmarks to ground the research of RL algorithms on real-world problems. The first benchmark adapts an open-source RC car platform called Donkey car to benchmark RL algorithms on continuous control of the car to learn to drive around miniature tracks from image observations. The second benchmark is based on a low-cost quadruped robot developed in this thesis called RealAnt, to benchmark RL algorithms on continuous control of the robot servos to learn basic tasks like turning and walking. The thesis demonstrates sample-efficient deep RL using existing methods on these benchmarks.
Supervising professor
Kannala, Juho, Prof., Aalto University, Department of Computer Science Science, Finland; Ilin, Alexander, Prof., Aalto University, Department of Computer Science Science, Finland
reinforcement learning, deep learning, robot learning, sample-efficient learning
Other note
  • [Publication 1]: Rinu Boney, Alexander Ilin, Juho Kannala, and Jarno Seppanen. Learning to Play Imperfect-Information Games by Imitating an Oracle Planner. Accepted for publication in IEEE Transactions on Games, March 2021.
    DOI: 10.1109/TG.2021.3067723 View at publisher
  • [Publication 2]: Rinu Boney*, Norman Di Palo*, Mathias Berglund, Alexander Ilin, Juho Kannala, Antti Rasmus, and Harri Valpola. Regularizing Trajectory Optimization with Denoising Autoencoders. In Advances in Neural Information Processing Systems 32, pp. 2859-2869, December 2019.
  • [Publication 3]: Rinu Boney, Juho Kannala, and Alexander Ilin. Regularizing Model-Based Planning with Energy-Based Models. In Conference on Robot Learning, pp. 182-191, October 2019. Full text in ACRIS/Aaltodoc:
  • [Publication 4]: Rinu Boney, Alexander Ilin and Juho Kannala. Learning of feature points without additional supervision improves reinforcement learning from images, 2021. arXiv:2106.07995
  • [Publication 5]: Ari Viitala*, Rinu Boney*, Yi Zhao, Alexander Ilin, and Juho Kannala. Learning to Drive (L2D) as a Low-Cost Benchmark for Real-World Reinforcement Learning. In 20th International Conference on Advanced Robotics, December 2021.
    DOI: 10.1109/ICAR53236.2021.9659342 View at publisher
  • [Publication 6]: Rinu Boney*, Jussi Sainio*, Mikko Kaivola, Arno Solin, and Juho Kannala. RealAnt: An Open-Source Low-Cost Quadruped for Education and Research in Real-World Reinforcement Learning, 2021. arXiv:2011.03085