### Browsing by Author "Boney, Rinu"

Now showing 1 - 9 of 9

###### Results Per Page

###### Sort Options

Item Active one-shot learning with prototypical networks(2019-01-01) Boney, Rinu; Ilin, Alexander; Department of Computer ScienceWe consider the problem of active one-shot classification where a classifier needs to adapt to new tasks by requesting labels for one example per class from (potentially many) unlabeled examples. We propose a clustering approach to the problem. The features extracted with Prototypical Networks [1] are clustered using K-means and the label for one representative sample from each cluster is requested to label the whole cluster. We demonstrate good performance of this simple active adaptation strategy using image data.Item Fast Adaptation of Neural Networks(2018-03-19) Boney, Rinu; Ilin, Alexander; Perustieteiden korkeakoulu; Kannala, JuhoThe ability to learn quickly from a few samples is a vital element of intelligence. Humans can reuse past knowledge and learn incredibly quickly. Also humans are able to interact with others to effectively guide their learning process. Computer vision systems for recognizing objects automatically from pixels are becoming commonplace in production systems. These modern computer vision systems use deep neural networks to automatically learn and recognize objects from data. Oftentimes, these deep neural networks used in production require a lot of data, take a long time to learn and forget old things when learning something new. We build upon previous methods called Prototypical Networks and Model-Agnostic Meta-Learning (MAML) that enables machines to learn to recognize new objects with very little supervision from the user. We extend these methods to the semi-supervised few-shot learning scenario, where the few labeled samples are accompanied with (potentially many) unlabeled samples. Our proposed methods are able to learn better by also making use of the additional unlabeled samples. We note that in many real-world applications the adaptation performance can be significantly improved by requesting the few labels through user feedback (active adaptation). Further, our proposed methods can also adapt to new tasks without any labeled examples (unsupervised adaptation) when the new task has the same output space as the training tasks do.Item Learning to Play Imperfect-Information Games by Imitating an Oracle Planner(IEEE, 2022) Boney, Rinu; Ilin, Alexander; Kannala, Juho; Seppanen, Jarno; Department of Computer Science; Professorship Kannala Juho; Computer Science Professors; Computer Science - Artificial Intelligence and Machine Learning (AIML); Professor of Practice Ilin Alexander; Computer Science - Visual Computing (VisualComputing); Supercell OyWe consider learning to play multiplayer imperfect-information games with simultaneous moves and large state-action spaces. Previous attempts to tackle such challenging games have largely focused on model-free learning methods, often requiring hundreds of years of experience to produce competitive agents. Our approach is based on model-based planning. We tackle the problem of partial observability by first building an (oracle) planner that has access to the full state of the environment and then distilling the knowledge of the oracle to a (follower) agent which is trained to play the imperfect-information game by imitating the oracle's choices. We experimentally show that planning with naive Monte Carlo tree search performs poorly in large combinatorial action spaces. We therefore propose planning with a fixed-depth tree search and decoupled Thompson sampling for action selection. We show that the planner is able to discover efficient playing strategies in the games of Clash Royale and Pommerman and the follower policy successfully learns to implement them by training on few hundred battles.Item Model-Based Reinforcement Learning from Pixels(2020-10-19) Zhao, Yi; Boney, Rinu; Sähkötekniikan korkeakoulu; Kannala, JuhoPeople learn skills by interacting with their surroundings from the time of their birth. Reinforcement learning (RL), learning a decision-making strategy (policy) to maximize a scalar reward signal by trial and error, offers such a learning paradigm to learn from surroundings. However, most of the current RL algorithms suffer from sample inefficiency: training an agent typically needs millions of samples. This thesis discusses model-based RL that is able to learn a policy to control robots from scratch with significantly fewer samples. Especially, this thesis focuses on the case where observations are high dimensional pixels. To achieve this goal, we first explain essential components to learn a latent dynamics model from high dimensional observations and to make decisions based on the learned dynamics model. Then we reproduce an algorithm called Dreamer to learn behaviors by latent imagination from pixels and test the reproduced algorithm on four benchmark tasks. Furthermore, we extend the Dreamer algorithm in two ways. The first way is decision-time policy refinement, where we refine the predicted policy by a planning algorithm named cross-entropy method (CEM). Second, we extend the flexibility of Dreamer by discretizing continuous action space. Our proposed method shows that by combining an ordinal architecture, the discrete policy can achieve similar performance on most tasks. This allows us to utilize a wide array of RL algorithms, which are previously limited in the discrete domain, to solve continuous control tasks. Finally, we discuss representation learning in reinforcement learning. And we explore the possibility of learning the dynamics model behind pixels without reconstruction by partially reproducing the MuZero algorithm. The MuZero learns a value-focused model, which represents a fully abstract dynamics model without reconstructing the observations and uses the Monte Carlo tree search (MCTS) to make decisions based on the learned model. Also, we extend the MuZero algorithm to solve a continuous control task called Cartpole-balance by discretizing the action space.Item Model-based Reinforcement Learning on a Real-World Hardware Platform RealAnt(2024-01-23) Kaivola, Mikko; Boney, Rinu; Perustieteiden korkeakoulu; Kannala, JuhoReinforcement learning has seen many advances in performance since deep learning was incorporated into the agents, especially in locomotive tasks with complex dynamics. However, the amount to interactions needed to successfully learn behaviors is quite high and makes real-world reinforcement learning time consuming. Model-based reinforcement learning was introduced as a possible remedy for this problem where a learned dynamics model is used to construct or learn a policy. This class of methods has been quite successful, being able to efficiently learn useful behavior from a fraction of the experience needed for model-free methods. However, these methods introduce many additional hyperparameters and complex interactions which tend to require more environment-specific tuning. In this thesis, we consider a recent real-world robotics platform RealAnt which is a quadruped robot based on the popular challenging locomotion benchmark Ant from Mujoco physics simulator. This platform is low-cost and open-source, making it appealing to budget-conscious real-world reinforcement learning research. The original article introduced various benchmark tasks and demonstrated successful learning using state-of-the-art model-free agents. Our aim is to investigate whether it is possible to improve upon these results by using model-based reinforcement learning. We investigated two types of model-based algorithm families, model-based model predictive control and model-based policy optimization which approach using simulated experience in different ways. To facilitate fast testing, we used the simulated Mujoco environment for the RealAnt. Our experiments for both model-based algorithm families did not yield learning that was comparable to the replicated model-free results, even when a pre-trained model-free agent was used to initialize the learning. We observed that the dynamics model had issues with prediction accuracy, especially when predicting for multiple steps. These issues were most pronounced at the beginning of each episode but also applied after it. Adding regularization to the planning or giving the dynamics model structure did not alleviate these issues. Thus for now, model-free reinforcement learning remains the most effective way of learning in this environment.Item Regularizing Model-Based Planning with Energy-Based Models(PMLR, 2020) Boney, Rinu; Kannala, Juho; Ilin, Alexander; Department of Computer Science; Professorship Kannala Juho; Professor of Practice Ilin AlexanderItem Sample-Efficient Methods for Real-World Deep Reinforcement Learning(Aalto University, 2022) Boney, Rinu; Tietotekniikan laitos; Department of Computer Science; Perustieteiden korkeakoulu; School of Science; Kannala, Juho, Prof., Aalto University, Department of Computer Science Science, Finland; Ilin, Alexander, Prof., Aalto University, Department of Computer Science Science, FinlandReinforcement learning (RL) is a general framework for learning and evaluating intelligent behaviors in any domain. Deep reinforcement learning combines RL with deep learning to learn expressive nonlinear functions that can interpret rich sensory signals to produce complex behaviors. However, this comes at the cost of increased sample complexity and instability, limiting the practical impact of deep RL algorithms on real-world problems. The thesis presents advances towards improving the sample efficiency and benchmarking of deep RL algorithms on real-world problems. This work develops sample-efficient deep RL algorithms for three different problem settings: multi-agent discrete control, continuous control, and continuous control from image observations. For multi-agent discrete control, the thesis proposes a sample-efficient model-based approach that plans using known dynamics models, to learn to play imperfect-information games with large state-action spaces. This is achieved by training a policy network from partial observations to imitate the actions of an oracle planner that has full observability. For continuous control, the thesis demonstrates that trajectory optimization with learned dynamics models could lead to the optimization procedure exploiting the inaccuracies of the model. The thesis proposes two regularization strategies to prevent this, based on uncertainty estimates from a denoising autoencoder or an energy-based model, to achieve rapid initial learning on a set of popular continuous control tasks. For continuous control problems with image observations, the thesis proposes an actor-critic method that learns feature point state representations, without any additional supervision, for improved sample efficiency. The thesis also introduces two low-cost robot learning benchmarks to ground the research of RL algorithms on real-world problems. The first benchmark adapts an open-source RC car platform called Donkey car to benchmark RL algorithms on continuous control of the car to learn to drive around miniature tracks from image observations. The second benchmark is based on a low-cost quadruped robot developed in this thesis called RealAnt, to benchmark RL algorithms on continuous control of the robot servos to learn basic tasks like turning and walking. The thesis demonstrates sample-efficient deep RL using existing methods on these benchmarks.Item Scale Model Autonomous Driving Benchmark for Deep Reinforcement Learning Algorithms(2021-08-24) Viitala, Ari; Boney, Rinu; Perustieteiden korkeakoulu; Kannala, JuhoReinforcement learning has seen major advancements in recent years and displayed superhuman performance in many tasks like board and video games. Thus far, the success has been limited to environments that can be simulated, as reinforcement learning algorithms traditionally require many interactions with the environment, up to millions of steps, which is easier to achieve in a simulation. In real-life reinforcement learning, algorithm development is hindered by many common engineering problems like reduced control over the environment and expensive scalability, making it challenging to apply reinforcement learning solutions to real-life problems, for example, autonomous driving. In this thesis, we present a low-cost framework for testing reinforcement learning algorithms in the real-world setting on a simple autonomous driving task. The framework is based on a hardware and software package, Donkey Car, modified to support reinforcement learning tasks. We train a soft actor-critic reinforcement learning agent on a lane following task and test its learning performance, generalization between similar problems and benchmark it to a supervised learning model and a human. We demonstrate that it is possible to train a reinforcement learning agent in 10 minutes of real-world time on a lane following task, making training feasible for real-world testing. Furthermore, the agent is able to learn comparable to or even surpass human skills on a high-speed driving task, and the models generalize between different tasks. We also show that results obtained from a simulated platform translate onto the real platform.Item Simplified Temporal Consistency Reinforcement Learning(PMLR, 2023-07) Zhao, Yi; Zhao, Wenshuai; Boney, Rinu; Kannala, Juho; Pajarinen, Joni; Robot Learning; Department of Computer Science; Computer Science Professors; Department of Electrical Engineering and Automation; Krause, Andread; Brunskill, Emma; Cho, Kyunghyun; Engelhardt, Barbara; Sabato, Sivan; Scarlett, JonathanReinforcement learning (RL) is able to solve complex sequential decision-making tasks but is currently limited by sample efficiency and required computation. To improve sample efficiency, recent work focuses on model-based RL which interleaves model learning with planning. Recent methods further utilize policy learning, value estimation, and, self-supervised learning as auxiliary objectives. In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. This applies when using pure planning with a dynamics model conditioned on the representation, but, also when utilizing the representation as policy and value function features in model-free RL. In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1× faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the Deepmind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods’ sample efficiency while training 2.4× faster.