Browsing by Author "D'Eramo, Carlo"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Convex Regularization in Monte-Carlo Tree Search(JMLR, 2021) Dam, Tuan; D'Eramo, Carlo; Peters, Jan; Pajarinen, Joni; Department of Electrical Engineering and Automation; Meila, M; Zhang, T; Robot Learning; Technische Universität DarmstadtMonte-Carlo planning and Reinforcement Learning (RL) are essential to sequential decision making. The recent AlphaGo and AlphaZero algorithms have shown how to successfully combine these two paradigms to solve large-scale sequential decision problems. These methodologies exploit a variant of the well-known UCT algorithm to trade off the exploitation of good actions and the exploration of unvisited states, but their empirical success comes at the cost of poor sample-efficiency and high computation time. In this paper, we overcome these limitations by introducing the use of convex regularization in Monte-Carlo Tree Search (MCTS) to drive exploration efficiently and to improve policy updates. First, we introduce a unifying theory on the use of generic convex regularizers in MCTS, deriving the first regret analysis of regularized MCTS and showing that it guarantees an exponential convergence rate. Second, we exploit our theoretical framework to introduce novel regularized backup operators for MCTS, based on the relative entropy of the policy update and, more importantly, on the Tsallis entropy of the policy, for which we prove superior theoretical guarantees. We empirically verify the consequence of our theoretical results on a toy problem. Finally, we show how our framework can easily be incorporated in AlphaGo and we empirically show the superiority of convex regularization, w.r.t. representative baselines, on well-known RL problems across several Atari games.Item Curriculum reinforcement learning via constrained optimal transport(PMLR, 2022) Klink, Pascal; Yang, Haoyi; D'Eramo, Carlo; Pajarinen, Joni; Peters, Jan; Department of Electrical Engineering and Automation; Robot Learning; Technische Universität DarmstadtCurriculum reinforcement learning (CRL) allows solving complex tasks by generating a tailored sequence of learning tasks, starting from easy ones and subsequently increasing their difficulty. Although the potential of curricula in RL has been clearly shown in a variety of works, it is less clear how to generate them for a given learning environment, resulting in a variety of methods aiming to automate this task. In this work, we focus on the idea of framing curricula as interpolations between task distributions, which has previously been shown to be a viable approach to CRL. Identifying key issues of existing methods, we frame the generation of a curriculum as a constrained optimal transport problem between task distributions. Benchmarks show that this way of curriculum generation can improve upon existing CRL methods, yielding high performance in a variety of tasks with different characteristics.Item Modular Value Function Factorization in Multi-Agent Reinforcement Learning(2022-10-17) Järnefelt, Oliver; D'Eramo, Carlo; Perustieteiden korkeakoulu; Ilin, AlexanderReal-world problems with multiple actors require them to coordinate while making decisions independently from each other. Typically, the large dimensionality and high unpredictability of the environment hinder the handcrafting or planning of effective behaviors. Multi-agent Reinforcement Learning~(MARL) provides a framework for solving such problems by learning a parameterized policy for each agent that only depends on the state. Common approaches factorize the joint value function of into individual agent utilities enabling them to take independent decisions or learn complex interactions by modeling the utility or payoff functions of the underlying coordination graph. In this thesis, we discover the benefit of exploiting the connection between these two approaches. We propose to leverage the modularity of the embedded coordination graph by formulating the total utility as a sum of subteam mixings and prove that our modular factorization is able to cover the Independent-Global-Max~(IGM) class of joint utility functions. We suggest finding the closest disjoint approximation of non-divisible graphs via graph partitioning, the quality of which we evaluate with a novel value-based partitioning distance measure. We derive theoretical and empirical advantages of our method evincing its benefit over baselines in several one-shot games, designed to highlight the promise of our modular factorization methods.Item A probabilistic interpretation of self-paced learning with applications to reinforcement learning(MICROTOME PUBL, 2021-07-01) Klink, Pascal; Abdulsamad, Hany; Belousov, Boris; D'Eramo, Carlo; Peters, Jan; Pajarinen, Joni; Department of Electrical Engineering and Automation; Robot Learning; Technische Universität DarmstadtAcross machine learning, the use of curricula has shown strong empirical potential to improve learning from data by avoiding local optima of training objectives. For reinforcement learning (RL), curricula are especially interesting, as the underlying optimization has a strong tendency to get stuck in local optima due to the exploration-exploitation trade-off. Recently, a number of approaches for an automatic generation of curricula for RL have been shown to increase performance while requiring less expert knowledge compared to manually designed curricula. However, these approaches are seldomly investigated from a theoretical perspective, preventing a deeper understanding of their mechanics. In this paper, we present an approach for automated curriculum generation in RL with a clear theoretical underpinning. More precisely, we formalize the well-known self-paced learning paradigm as inducing a distribution over training tasks, which trades off between task complexity and the objective to match a desired task distribution. Experiments show that training on this induced distribution helps to avoid poor local optima across RL algorithms in different tasks with uninformative rewards and challenging exploration requirements.