SafeAPT: Safe Simulation-to-Real Robot Learning Using Diverse Policies Learned in Simulation
Loading...
Access rights
openAccess
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Authors
Date
2022-07-01
Major/Subject
Mcode
Degree programme
Language
en
Pages
8
Series
IEEE Robotics and Automation Letters, Volume 7, issue 3, pp. 6838-6845
Abstract
The framework of sim-to-real learning, i.e., training policies in simulation and transferring them to real-world systems, is one of the most promising approaches towards data-efficient learning in robotics. However, due to the inevitable reality gap between the simulation and the real world, a policy learned in the simulation may not always generate a safe behaviour on the real robot. As a result, during policy adaptation in the real world, the robot may damage itself or cause harm to its surroundings. In this work, we introduce SafeAPT, a multi-goal robot learning algorithm that leverages a diverse repertoire of policies evolved in simulation and transfers the most promising safe policy to the real robot through episodic interaction. To achieve this, SafeAPT iteratively learns probabilistic reward and safety models from real-world observations using simulated experiences as priors. Then, it performs Bayesian optimization to select the best policy from the repertoire with the reward model, while maintaining the specified safety constraint using the safety model. SafeAPT allows a robot to adapt to a wide range of goals safely with the same repertoire of policies evolved in the simulation. We compare SafeAPT with several baselines, both in simulated and real robotic experiments, and show that SafeAPT finds high-performing policies within a few minutes of real-world operation while minimizing safety violations during the interactions.Description
Publisher Copyright: © 2016 IEEE.
Keywords
Evolutionary robotics, learning from experience, machine learning for robot control
Other note
Citation
Kaushik, R, Arndt, K & Kyrki, V 2022, ' SafeAPT: Safe Simulation-to-Real Robot Learning Using Diverse Policies Learned in Simulation ', IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 6838-6845 . https://doi.org/10.1109/LRA.2022.3177294