Browsing by Author "Arndt, Karol"
Now showing 1 - 12 of 12
- Results Per Page
- Sort Options
- Domain Curiosity: Learning Efficient Data Collection Strategies for Domain Adaptation
A4 Artikkeli konferenssijulkaisussa(2021-12-16) Arndt, Karol; Struckmeier, Oliver; Kyrki, VilleDomain adaptation is a common problem in robotics, with applications such as transferring policies from simulation to real world and lifelong learning. Performing such adaptation, however, requires informative data about the environment to be available during the adaptation. In this paper, we present domain curiosity—a method of training exploratory policies that are explicitly optimized to provide data that allows a model to learn about the unknown aspects of the environment. In contrast to most curiosity methods, our approach explicitly rewards learning, which makes it robust to environment noise without sacrificing its ability to learn. We evaluate the proposed method by comparing how much a model can learn about environment dynamics given data collected by the proposed approach, compared to standard curious and random policies. The evaluation is performed using a toy environment, two simulated robot setups, and on a real-world haptic exploration task. The results show that the proposed method allows data-efficient and accurate estimation of dynamics. - DROPO: Sim-to-real transfer with offline domain randomization
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2023-08) Tiboni, Gabriele; Arndt, Karol; Kyrki, VilleIn recent years, domain randomization over dynamics parameters has gained a lot of traction as a method for sim-to-real transfer of reinforcement learning policies in robotic manipulation; however, finding optimal randomization distributions can be difficult. In this paper, we introduce DROPO, a novel method for estimating domain randomization distributions for safe sim-to-real transfer. Unlike prior work, DROPO only requires a limited, precollected offline dataset of trajectories, and explicitly models parameter uncertainty to match real data using a likelihood-based approach. We demonstrate that DROPO is capable of recovering dynamic parameter distributions in simulation and finding a distribution capable of compensating for an unmodeled phenomenon. We also evaluate the method in two zero-shot sim-to-real transfer scenarios, showing successful domain transfer and improved performance over prior methods. - The effect of inductive biases on emergent exploration in meta reinforcement learning
Sähkötekniikan korkeakoulu | Bachelor's thesis(2019-12-02) Keurulainen, Oskar - Few-shot model-based adaptation in noisy conditions
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2021-04) Arndt, Karol; Ghadirzadeh, Ali; Hazara, Murtaza; Kyrki, VilleFew-shot adaptation is a challenging problem in the context of simulation-to-real transfer in robotics, requiring safe and informative data collection. In physical systems, additional challenge may be posed by domain noise, which is present in virtually all real-world applications. In this letter, we propose to perform few-shot adaptation of dynamics models in noisy conditions using an uncertainty-aware Kalman filter-based neural network architecture. We show that the proposed method, which explicitly addresses domain noise, improves few-shot adaptation error over a blackbox adaptation LSTM baseline, and over a model-free on-policy reinforcement learning approach, which tries to learn an adaptable and informative policy at the same time. The proposed method also allows for system analysis by analyzing hidden states of the model during and after adaptation. - imitation learning in robotics
Sähkötekniikan korkeakoulu | Bachelor's thesis(2019-05-07) Jääskeläinen, Heikki - Learning Affordance Representations: An Efficient Learning Approach for End-to-End Visuomotor Control
Perustieteiden korkeakoulu | Master's thesis(2019-08-19) Hämäläinen, AleksiThe development of data-driven approaches, such as deep learning, has led to the emergence of systems that have achieved human-like performance in wide variety of tasks. For robotic tasks, deep data-driven models are introduced to create adaptive systems without the need of explicitly programming them. These adaptive systems are needed in situations, where task and environment changes remain unforeseen. Convolutional neural networks (CNNs) have become the standard way to process visual data in robotics. End-to-end neural network models that operate the entire control task can perform various complex tasks with little feature engineering. However, the adaptivity of these systems goes hand in hand with the level of variation in the training data. Training end-to-end deep robotic systems requires a lot of domain-, task-, and hardware-specific data, which is often costly to provide. In this work, we propose to tackle this issue by employing a deep neural network with a modular architecture, consisting of separate perception, policy, and trajectory parts. Each part of the system is trained fully on synthetic data or in simulation. The data is exchanged between parts of the system as low-dimensional representations of affordances and trajectories. The performance is then evaluated in a zero-shot transfer scenario using the Franka Panda robotic arm. Results demonstrate that a low-dimensional representation of scene affordances extracted from an RGB image is sufficient to successfully train manipulator policies. - Policy search using model-based methods
Sähkötekniikan korkeakoulu | Bachelor's thesis(2019-05-17) Nedergård, Benjamin - Robot simulation for reinforcement learning
Sähkötekniikan korkeakoulu | Bachelor's thesis(2022-05-13) Aho, Pyry - Safe and efficient transfer of robot policies from simulation to the real world
School of Electrical Engineering | Doctoral dissertation (article-based)(2023) Arndt, KarolThe past decade has witnessed enormous progress in reinforcement learning, with intelligent agents learning to perform a variety of different tasks, including locomotion, imitating human behavior, and even outperforming human experts in a range of board games and video games of various complexity, such as Pong, Go, or Dota 2. However, all these tasks share one common characteristic: they are all either performed entirely in simulation, or are based on simple rules that can be perfectly modeled in software. Furthermore, current reinforcement learning approaches that perform well in virtual environments cannot be directly applied to physical agents operating in the real world, such as robots, due to their reliance on massive data collection. As such, the training process not only takes a long time, resulting in hardware depreciation, but often involves a safety risk associated with active exploration: the agent must evaluate a large number of possible actions in order to decide on the best one, some of which can lead to catastrophic outcomes. One proposed solution to this problem is to train reinforcement learning policies for robots in simulation and to later deploy the trained behavior policy on the real physical system. This approach, however, raises a number of new issues: simulated dynamics and observations do not exactly match the real world, and thus behaviors learned in simulation often do not transfer well to the real system. This thesis formulates the sim-to-real transfer of robot policies as an augmented Markov decision process. Within the proposed framework, the problem is then divided into individual subproblems, each of which is addressed separately. The thesis begins with a discussion of the possibility of transferring behavior policies to the real world without any real-world data available to the algorithm. The applicability of such methods to the case of dynamics and visual discrepancies between source and target domains is analyzed and the limitations of such methods in both scenarios are discussed. The thesis then evaluates a range of methods for using real-world data to improve domain transfer accuracy in a data-efficient way, with a focus on system parameter estimation, policy and model adaptation through meta-learning, and efficient ways of collecting informative real-world data. Finally, the thesis discusses the safety aspects of the sim-to-real adaptation scenario by extending the augmented MDP framework, and it explores how safe adaptation can be achieved through constraints on the action space and through cautious, safety-aware domain adaptation algorithms. The safety considerations behind finding optimal parameter distributions for sim-to-real policy training are also discussed. Our experiments show that robot policies can be successfully transferred from simulation to the real world and that each of the different issues with sim-to-real domain transfer can be addressed with dedicated algorithms, leading to safe and efficient real-world operation. - SafeAPT: Safe Simulation-to-Real Robot Learning Using Diverse Policies Learned in Simulation
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2022-07-01) Kaushik, Rituraj; Arndt, Karol; Kyrki, VilleThe framework of sim-to-real learning, i.e., training policies in simulation and transferring them to real-world systems, is one of the most promising approaches towards data-efficient learning in robotics. However, due to the inevitable reality gap between the simulation and the real world, a policy learned in the simulation may not always generate a safe behaviour on the real robot. As a result, during policy adaptation in the real world, the robot may damage itself or cause harm to its surroundings. In this work, we introduce SafeAPT, a multi-goal robot learning algorithm that leverages a diverse repertoire of policies evolved in simulation and transfers the most promising safe policy to the real robot through episodic interaction. To achieve this, SafeAPT iteratively learns probabilistic reward and safety models from real-world observations using simulated experiences as priors. Then, it performs Bayesian optimization to select the best policy from the repertoire with the reward model, while maintaining the specified safety constraint using the safety model. SafeAPT allows a robot to adapt to a wide range of goals safely with the same repertoire of policies evolved in the simulation. We compare SafeAPT with several baselines, both in simulated and real robotic experiments, and show that SafeAPT finds high-performing policies within a few minutes of real-world operation while minimizing safety violations during the interactions. - Simulation to reality transfer in robot learning
Sähkötekniikan korkeakoulu | Bachelor's thesis(2019-09-12) Vilhunen, Atte - Training and Evaluation of Deep Policies Using Reinforcement Learning and Generative Models
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2022-08-04) Ghadirzadeh, Ali; Poklukar, Petra; Arndt, Karol; Finn, Chelsea; Kyrki, Ville; Kragic, Danica; Björkman, MårtenWe present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable generative models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basketball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods.