Safe and efficient transfer of robot policies from simulation to the real world

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering | Doctoral thesis (article-based) | Defence date: 2023-05-16
Degree programme
100 + app.110
Aalto University publication series DOCTORAL THESES, 55/2023
The past decade has witnessed enormous progress in reinforcement learning, with intelligent agents learning to perform a variety of different tasks, including locomotion, imitating human behavior, and even outperforming human experts in a range of board games and video games of various complexity, such as Pong, Go, or Dota 2. However, all these tasks share one common characteristic: they are all either performed entirely in simulation, or are based on simple rules that can be perfectly modeled in software. Furthermore, current reinforcement learning approaches that perform well in virtual environments cannot be directly applied to physical agents operating in the real world, such as robots, due to their reliance on massive data collection. As such, the training process not only takes a long time, resulting in hardware depreciation, but often involves a safety risk associated with active exploration: the agent must evaluate a large number of possible actions in order to decide on the best one, some of which can lead to catastrophic outcomes. One proposed solution to this problem is to train reinforcement learning policies for robots in simulation and to later deploy the trained behavior policy on the real physical system. This approach, however, raises a number of new issues: simulated dynamics and observations do not exactly match the real world, and thus behaviors learned in simulation often do not transfer well to the real system. This thesis formulates the sim-to-real transfer of robot policies as an augmented Markov decision process. Within the proposed framework, the problem is then divided into individual subproblems, each of which is addressed separately. The thesis begins with a discussion of the possibility of transferring behavior policies to the real world without any real-world data available to the algorithm. The applicability of such methods to the case of dynamics and visual discrepancies between source and target domains is analyzed and the limitations of such methods in both scenarios are discussed. The thesis then evaluates a range of methods for using real-world data to improve domain transfer accuracy in a data-efficient way, with a focus on system parameter estimation, policy and model adaptation through meta-learning, and efficient ways of collecting informative real-world data. Finally, the thesis discusses the safety aspects of the sim-to-real adaptation scenario by extending the augmented MDP framework, and it explores how safe adaptation can be achieved through constraints on the action space and through cautious, safety-aware domain adaptation algorithms. The safety considerations behind finding optimal parameter distributions for sim-to-real policy training are also discussed. Our experiments show that robot policies can be successfully transferred from simulation to the real world and that each of the different issues with sim-to-real domain transfer can be addressed with dedicated algorithms, leading to safe and efficient real-world operation.
Supervising professor
Kyrki, Ville, Prof., Aalto University, Department of Electrical Engineering and Automation, Finland
robotics, machine learning, reinforcement learning
  • [Publication 1]: Aleksi Hämäläinen, Karol Arndt, Ali Ghadirzadeh and Ville Kyrki. Affordance Learning for End-to-end Visuomotor Control. In International Conference on Intelligent Robots and Systems (IROS), Macau, China. pp. 1781–1788, November 2019
  • [Publication 2]: Karol Arndt, Murtaza Hazara, Ali Ghadirzadeh and Ville Kyrki. Meta reinforcement learning for sim-to-real domain adaptation. In International Conference on Robotics and Automation (ICRA), Paris, France. pp. 2725–2731, May 2020.
    DOI: 10.1109/ICRA40945.2020.9196540 View at publisher
  • [Publication 3]: Karol Arndt, Ali Ghadirzadeh, Murtaza Hazara and Ville Kyrki. Fewshot model-based adaptation in noisy conditions. Robotics and Automation Letters (RA-L), vol. 6, issue 2, pp. 4193–4200, April 2021.
    Full text in Acris/Aaltodoc:
    DOI: 10.1109/LRA.2021.3068104 View at publisher
  • [Publication 4]: Karol Arndt, Oliver Struckmeier and Ville Kyrki. Domain Curiosity: Learning Efficient Data Collection Strategies for Domain Adaptation. In International Conference on Intelligent Robots and Systems (IROS), Prague, Czechia. pp. 1259–1266, October 2021.
    Full text in Acris/Aaltodoc:
    DOI: 10.1109/IROS51168.2021.9635864 View at publisher
  • [Publication 5]: Rituraj Kaushik, Karol Arndt and Ville Kyrki. SafeAPT: Safe Simulationto-Real Robot Learning using Diverse Policies Learned in Simulation. Robotics and Automation Letters (RA-L), vol. 7, issue 3, pp. 6838–6845, July 2022.
    Full text in Acris/Aaltodoc:
    DOI: 10.1109/LRA.2022.3177294 View at publisher
  • [Publication 6]: Gabriele Tiboni, Karol Arndt and Ville Kyrki. DROPO: Sim-to-Real Transfer with Offline Domain Randomization. Submitted for publication, June 2022
  • [Publication 7]: Ali Ghadirzadeh, Petra Poklukar, Karol Arndt, Chelsea Finn, Ville Kyrki, Danica Kragic and Mårten Björkman. Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models. Journal of Machine Learning Research (JMLR), vol. 23 (174), pp. 1–37, June 2022.
    Full text in Acris/Aaltodoc: