Hierarchical policy network

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
Sähkötekniikan korkeakoulu | Master's thesis
Date
2018-06-18
Department
Major/Subject
Control, Robotics and Autonomous Systems
Mcode
ELEC3025
Degree programme
AEE - Master’s Programme in Automation and Electrical Engineering (TS2013)
Language
en
Pages
118
Series
Abstract
Ability to learn effective policies from control examples is an apparent milestone towards expert and general artificial intelligences. Methodology for developing this ability have taken a new turn with incorporating deep learning in the classic control setting. However, conventional methods face a few challenges, the most severe of which is the adversarial behavior problem. To address the challenges, this thesis examines mechanics of neural networks leading to the undesired effects, and discusses existing and new solutions. In particular, this thesis studies how adversarial effects can be countered with denoising-based methods. For this purpose, the study has developed a simulation environment from which a dataset was collected, and world models were approximated from the dataset in offline fashion. Then, for demonstrating the adversarial behavior problem as well as proposed solution in domain of control setting, a proxy MPC-based application was developed. Finally, experiments with an aggregated policy network revealed additional challenges related to inaccuracies of trajectories simulated during the training, and insufficient coverage of the training manifold during the optimization process. These problems were addressed with development of auxiliary techniques. The results of the theoretical and experimental study indicate that the primary task of policy acquisition is prone to adversarial and similar effects, which are reviewed as natural phenomena occurring in all neural networks, and that denoising-based methods can be successfully utilized as a countermeasure. The thesis assesses effectiveness and limitations of employed methods.
Description
Supervisor
Kyrki, Ville
Thesis advisor
Berglund, Mathias
Keywords
adversarial effects, policy network, offline predictive model, policy optimization
Other note
Citation