Data-Efficient Learning using Modular Meta-Learning
Loading...
URL
Journal Title
Journal ISSN
Volume Title
Sähkötekniikan korkeakoulu |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2022-08-22
Department
Major/Subject
Control, Robotics and Autonomous Systems
Mcode
ELEC3025
Degree programme
AEE - Master’s Programme in Automation and Electrical Engineering (TS2013)
Language
en
Pages
63
Series
Abstract
Meta-learning, or learning to learn, has become well-known in the field of artificial intelligence as a technique for improving the learning performance of learning algorithms. It has been used to uncover the learning principles that allow learned models to effectively adapt and generalise to new tasks after deployment. Meta-learning via meta-loss learning is a framework that is used to train loss or reward functions that improve the sample efficiency, learning stability, and convergence speed of models trained under them. One of the models that can be improved using this framework is Neural Dynamic Policies (NDPs), which are made up of a deep neural network and a dynamical system. They can be used to predict trajectories given high-dimensional inputs, such as images. The objective of this thesis is to learn loss functions to speed up and stabilize the training process of complex policies. Specifically, this work aims to investigate the possibility of enhancing the performance of Neural Dynamic Policies using a meta-learning method for learning parametric loss functions in both supervised and reinforcement learning settings. To this end, the task is to learn to draw numbers using the S-mnist dataset and the results show that NDPs trained on the newly learned loss outperforms the baseline in terms of learning speed and sample efficiency.Description
Supervisor
Kyrki, VilliThesis advisor
Abu-Dakka, FaresKeywords
meta-learning, neural dynamic policies, data-efficiency, training stability, imitation learning, reinforcement learning