Learning Centre

Sample-Efficient Methods for Real-World Deep Reinforcement Learning

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Boney, Rinu
dc.date.accessioned 2022-05-25T09:00:10Z
dc.date.available 2022-05-25T09:00:10Z
dc.date.issued 2022
dc.identifier.isbn 978-952-64-0809-5 (electronic)
dc.identifier.isbn 978-952-64-0808-8 (printed)
dc.identifier.issn 1799-4942 (electronic)
dc.identifier.issn 1799-4934 (printed)
dc.identifier.issn 1799-4934 (ISSN-L)
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/114584
dc.description.abstract Reinforcement learning (RL) is a general framework for learning and evaluating intelligent behaviors in any domain. Deep reinforcement learning combines RL with deep learning to learn expressive nonlinear functions that can interpret rich sensory signals to produce complex behaviors. However, this comes at the cost of increased sample complexity and instability, limiting the practical impact of deep RL algorithms on real-world problems. The thesis presents advances towards improving the sample efficiency and benchmarking of deep RL algorithms on real-world problems. This work develops sample-efficient deep RL algorithms for three different problem settings: multi-agent discrete control, continuous control, and continuous control from image observations. For multi-agent discrete control, the thesis proposes a sample-efficient model-based approach that plans using known dynamics models, to learn to play imperfect-information games with large state-action spaces. This is achieved by training a policy network from partial observations to imitate the actions of an oracle planner that has full observability. For continuous control, the thesis demonstrates that trajectory optimization with learned dynamics models could lead to the optimization procedure exploiting the inaccuracies of the model. The thesis proposes two regularization strategies to prevent this, based on uncertainty estimates from a denoising autoencoder or an energy-based model, to achieve rapid initial learning on a set of popular continuous control tasks. For continuous control problems with image observations, the thesis proposes an actor-critic method that learns feature point state representations, without any additional supervision, for improved sample efficiency. The thesis also introduces two low-cost robot learning benchmarks to ground the research of RL algorithms on real-world problems. The first benchmark adapts an open-source RC car platform called Donkey car to benchmark RL algorithms on continuous control of the car to learn to drive around miniature tracks from image observations. The second benchmark is based on a low-cost quadruped robot developed in this thesis called RealAnt, to benchmark RL algorithms on continuous control of the robot servos to learn basic tasks like turning and walking. The thesis demonstrates sample-efficient deep RL using existing methods on these benchmarks. en
dc.format.extent 92 + app. 76
dc.format.mimetype application/pdf en
dc.language.iso en en
dc.publisher Aalto University en
dc.publisher Aalto-yliopisto fi
dc.relation.ispartofseries Aalto University publication series DOCTORAL THESES en
dc.relation.ispartofseries 71/2022
dc.relation.haspart [Publication 1]: Rinu Boney, Alexander Ilin, Juho Kannala, and Jarno Seppanen. Learning to Play Imperfect-Information Games by Imitating an Oracle Planner. Accepted for publication in IEEE Transactions on Games, March 2021. DOI: 10.1109/TG.2021.3067723
dc.relation.haspart [Publication 2]: Rinu Boney*, Norman Di Palo*, Mathias Berglund, Alexander Ilin, Juho Kannala, Antti Rasmus, and Harri Valpola. Regularizing Trajectory Optimization with Denoising Autoencoders. In Advances in Neural Information Processing Systems 32, pp. 2859-2869, December 2019.
dc.relation.haspart [Publication 3]: Rinu Boney, Juho Kannala, and Alexander Ilin. Regularizing Model-Based Planning with Energy-Based Models. In Conference on Robot Learning, pp. 182-191, October 2019. Full text in ACRIS/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-202102021858. http://proceedings.mlr.press/v100/boney20a.html
dc.relation.haspart [Publication 4]: Rinu Boney, Alexander Ilin and Juho Kannala. Learning of feature points without additional supervision improves reinforcement learning from images, 2021. arXiv:2106.07995
dc.relation.haspart [Publication 5]: Ari Viitala*, Rinu Boney*, Yi Zhao, Alexander Ilin, and Juho Kannala. Learning to Drive (L2D) as a Low-Cost Benchmark for Real-World Reinforcement Learning. In 20th International Conference on Advanced Robotics, December 2021. DOI: 10.1109/ICAR53236.2021.9659342
dc.relation.haspart [Publication 6]: Rinu Boney*, Jussi Sainio*, Mikko Kaivola, Arno Solin, and Juho Kannala. RealAnt: An Open-Source Low-Cost Quadruped for Education and Research in Real-World Reinforcement Learning, 2021. arXiv:2011.03085
dc.subject.other Computer science en
dc.title Sample-Efficient Methods for Real-World Deep Reinforcement Learning en
dc.type G5 Artikkeliväitöskirja fi
dc.contributor.school Perustieteiden korkeakoulu fi
dc.contributor.school School of Science en
dc.contributor.department Tietotekniikan laitos fi
dc.contributor.department Department of Computer Science en
dc.subject.keyword reinforcement learning en
dc.subject.keyword deep learning en
dc.subject.keyword robot learning en
dc.subject.keyword sample-efficient learning en
dc.identifier.urn URN:ISBN:978-952-64-0809-5
dc.type.dcmitype text en
dc.type.ontasot Doctoral dissertation (article-based) en
dc.type.ontasot Väitöskirja (artikkeli) fi
dc.contributor.supervisor Kannala, Juho, Prof., Aalto University, Department of Computer Science Science, Finland; Ilin, Alexander, Prof., Aalto University, Department of Computer Science Science, Finland
dc.opn Tassa, Yuval, Dr., DeepMind, United Kingdom
dc.rev Johns, Edward, Dr., Imperial College London, United Kingdom
dc.rev van Hoof, Herke, Prof., University of Amsterdam, Netherlands
dc.date.defence 2022-06-09
local.aalto.acrisexportstatus checked 2022-06-13_0857
local.aalto.infra Science-IT
local.aalto.formfolder 2022_05_25_klo_10_37
local.aalto.archive yes

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search archive

Advanced Search

article-iconSubmit a publication