Guided policy search for a lightweight industrial robot arm

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Sähkötekniikan korkeakoulu | Master's thesis
Date
2018-12-17
Department
Major/Subject
Space Robotics and Automation 2017-2018
Mcode
ELEC3047
Degree programme
Erasmus Mundus Space Master
Language
en
Pages
58+6
Series
Abstract
General autonomy is at the forefront of robotic research and practice. Earlier research has enabled robots to learn movement and manipulation within the context of a specific instance of a task and to learn from large quantities of empirical data and known dynamics. Reinforcement learning (RL) tackles generalisation, whereby a robot may be relied upon to perform its task with acceptable speed and fidelity in multiple---even arbitrary---task configurations. Recent research has advanced approximate policy search methods of RL, in which a function approximator is used to represent an optimal policy while avoiding calculation across the large dimensions of the state and action spaces of real robots. This thesis details the implementation and testing, on a lightweight industrial robot arm, of guided policy search (GPS), an RL algorithm that seeks to avoid the typical need, in machine learning, for lots of empirical behavioural samples, while maximising learning speed. GPS comprises a local optimal policy generator, here based on a linear-quadratic regulator, and an approximate general policy representation, here a feedforward neural network. A controller is written to interface an existing back-end implementation of GPS and the robot itself. Experimental results show that the GPS agent is able to perform basic reaching tasks across its configuration space with approximately 15 minutes of training, but that the local policies generated fail to be fully optimised within that timescale and that post-training operation suffers from oscillatory actions under perturbed initial joint positions. Further work is discussed and recommended for better training of GPS agents and making locally optimal policies more robust to disturbance while in operation.
Description
Supervisor
Kyrki, Ville
Thesis advisor
Lundell, Jens
Keywords
guided policy search, reinforcement learning, deep learning, robotics, artificial intelligence, policy search
Other note
Citation