Sample-Efficient Methods for Real-World Deep Reinforcement Learning

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorBoney, Rinu
dc.contributor.departmentTietotekniikan laitosfi
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.schoolSchool of Scienceen
dc.contributor.supervisorKannala, Juho, Prof., Aalto University, Department of Computer Science Science, Finland; Ilin, Alexander, Prof., Aalto University, Department of Computer Science Science, Finland
dc.date.accessioned2022-05-25T09:00:10Z
dc.date.available2022-05-25T09:00:10Z
dc.date.defence2022-06-09
dc.date.issued2022
dc.description.abstractReinforcement learning (RL) is a general framework for learning and evaluating intelligent behaviors in any domain. Deep reinforcement learning combines RL with deep learning to learn expressive nonlinear functions that can interpret rich sensory signals to produce complex behaviors. However, this comes at the cost of increased sample complexity and instability, limiting the practical impact of deep RL algorithms on real-world problems. The thesis presents advances towards improving the sample efficiency and benchmarking of deep RL algorithms on real-world problems. This work develops sample-efficient deep RL algorithms for three different problem settings: multi-agent discrete control, continuous control, and continuous control from image observations. For multi-agent discrete control, the thesis proposes a sample-efficient model-based approach that plans using known dynamics models, to learn to play imperfect-information games with large state-action spaces. This is achieved by training a policy network from partial observations to imitate the actions of an oracle planner that has full observability. For continuous control, the thesis demonstrates that trajectory optimization with learned dynamics models could lead to the optimization procedure exploiting the inaccuracies of the model. The thesis proposes two regularization strategies to prevent this, based on uncertainty estimates from a denoising autoencoder or an energy-based model, to achieve rapid initial learning on a set of popular continuous control tasks. For continuous control problems with image observations, the thesis proposes an actor-critic method that learns feature point state representations, without any additional supervision, for improved sample efficiency. The thesis also introduces two low-cost robot learning benchmarks to ground the research of RL algorithms on real-world problems. The first benchmark adapts an open-source RC car platform called Donkey car to benchmark RL algorithms on continuous control of the car to learn to drive around miniature tracks from image observations. The second benchmark is based on a low-cost quadruped robot developed in this thesis called RealAnt, to benchmark RL algorithms on continuous control of the robot servos to learn basic tasks like turning and walking. The thesis demonstrates sample-efficient deep RL using existing methods on these benchmarks.en
dc.format.extent92 + app. 76
dc.format.mimetypeapplication/pdfen
dc.identifier.isbn978-952-64-0809-5 (electronic)
dc.identifier.isbn978-952-64-0808-8 (printed)
dc.identifier.issn1799-4942 (electronic)
dc.identifier.issn1799-4934 (printed)
dc.identifier.issn1799-4934 (ISSN-L)
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/114584
dc.identifier.urnURN:ISBN:978-952-64-0809-5
dc.language.isoenen
dc.opnTassa, Yuval, Dr., DeepMind, United Kingdom
dc.publisherAalto Universityen
dc.publisherAalto-yliopistofi
dc.relation.haspart[Publication 1]: Rinu Boney, Alexander Ilin, Juho Kannala, and Jarno Seppanen. Learning to Play Imperfect-Information Games by Imitating an Oracle Planner. Accepted for publication in IEEE Transactions on Games, March 2021. DOI: 10.1109/TG.2021.3067723
dc.relation.haspart[Publication 2]: Rinu Boney*, Norman Di Palo*, Mathias Berglund, Alexander Ilin, Juho Kannala, Antti Rasmus, and Harri Valpola. Regularizing Trajectory Optimization with Denoising Autoencoders. In Advances in Neural Information Processing Systems 32, pp. 2859-2869, December 2019.
dc.relation.haspart[Publication 3]: Rinu Boney, Juho Kannala, and Alexander Ilin. Regularizing Model-Based Planning with Energy-Based Models. In Conference on Robot Learning, pp. 182-191, October 2019. Full text in ACRIS/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-202102021858. http://proceedings.mlr.press/v100/boney20a.html
dc.relation.haspart[Publication 4]: Rinu Boney, Alexander Ilin and Juho Kannala. Learning of feature points without additional supervision improves reinforcement learning from images, 2021. arXiv:2106.07995
dc.relation.haspart[Publication 5]: Ari Viitala*, Rinu Boney*, Yi Zhao, Alexander Ilin, and Juho Kannala. Learning to Drive (L2D) as a Low-Cost Benchmark for Real-World Reinforcement Learning. In 20th International Conference on Advanced Robotics, December 2021. DOI: 10.1109/ICAR53236.2021.9659342
dc.relation.haspart[Publication 6]: Rinu Boney*, Jussi Sainio*, Mikko Kaivola, Arno Solin, and Juho Kannala. RealAnt: An Open-Source Low-Cost Quadruped for Education and Research in Real-World Reinforcement Learning, 2021. arXiv:2011.03085
dc.relation.ispartofseriesAalto University publication series DOCTORAL THESESen
dc.relation.ispartofseries71/2022
dc.revJohns, Edward, Dr., Imperial College London, United Kingdom
dc.revvan Hoof, Herke, Prof., University of Amsterdam, Netherlands
dc.subject.keywordreinforcement learningen
dc.subject.keyworddeep learningen
dc.subject.keywordrobot learningen
dc.subject.keywordsample-efficient learningen
dc.subject.otherComputer scienceen
dc.titleSample-Efficient Methods for Real-World Deep Reinforcement Learningen
dc.typeG5 Artikkeliväitöskirjafi
dc.type.dcmitypetexten
dc.type.ontasotDoctoral dissertation (article-based)en
dc.type.ontasotVäitöskirja (artikkeli)fi
local.aalto.acrisexportstatuschecked 2022-06-13_0857
local.aalto.archiveyes
local.aalto.formfolder2022_05_25_klo_10_37
local.aalto.infraScience-IT

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
isbn9789526408095.pdf
Size:
1.64 MB
Format:
Adobe Portable Document Format