Browsing by Author "Peters, Jan"
Now showing 1 - 12 of 12
Results Per Page
Sort Options
Item Autonomous underwater vehicle link alignment control in unknown environments using reinforcement learning(John Wiley & Sons, 2024-09) Weng, Yang; Chun, Sehwa; Ohashi, Masaki; Matsuda, Takumi; Sekimori, Yuki; Pajarinen, Joni; Peters, Jan; Maki, Toshihiro; Department of Electrical Engineering and Automation; Robot Learning; University of Tokyo; Meiji University; Technische Universität DarmstadtHigh-speed underwater wireless optical communication holds immense promise in ocean monitoring and surveys, providing crucial support for the real-time sharing of observational data collected by autonomous underwater vehicles (AUVs). However, due to inaccurate target information and external interference in unknown environments, link alignment is challenging and needs to be addressed. In response to these challenges, we propose a reinforcement learning-based alignment method to control the AUV to establish an optical link and maintain alignment. Our alignment control system utilizes a combination of sensors, including a depth sensor, Doppler velocity log (DVL), gyroscope, ultra-short baseline device, and acoustic modem. These sensors are used in conjunction with a particle filter to observe the environment and estimate the AUV's state accurately. The soft actor-critic algorithm is used to train a reinforcement learning-based controller in a simulated environment to reduce pointing errors and energy consumption in alignment. After experimental validation in simulation, we deployed the controller on an actual AUV called Tri-TON. In experiments at sea, Tri-TON maintained the link and angular pointing errors within 1 m and (Formula presented.), respectively. Experimental results demonstrate that the proposed alignment control method can establish underwater optical communication between AUV fleets, thus improving the efficiency of marine surveys.Item Convex Regularization in Monte-Carlo Tree Search(JMLR, 2021) Dam, Tuan; D'Eramo, Carlo; Peters, Jan; Pajarinen, Joni; Department of Electrical Engineering and Automation; Meila, M; Zhang, T; Robot Learning; Technische Universität DarmstadtMonte-Carlo planning and Reinforcement Learning (RL) are essential to sequential decision making. The recent AlphaGo and AlphaZero algorithms have shown how to successfully combine these two paradigms to solve large-scale sequential decision problems. These methodologies exploit a variant of the well-known UCT algorithm to trade off the exploitation of good actions and the exploration of unvisited states, but their empirical success comes at the cost of poor sample-efficiency and high computation time. In this paper, we overcome these limitations by introducing the use of convex regularization in Monte-Carlo Tree Search (MCTS) to drive exploration efficiently and to improve policy updates. First, we introduce a unifying theory on the use of generic convex regularizers in MCTS, deriving the first regret analysis of regularized MCTS and showing that it guarantees an exponential convergence rate. Second, we exploit our theoretical framework to introduce novel regularized backup operators for MCTS, based on the relative entropy of the policy update and, more importantly, on the Tsallis entropy of the policy, for which we prove superior theoretical guarantees. We empirically verify the consequence of our theoretical results on a toy problem. Finally, we show how our framework can easily be incorporated in AlphaGo and we empirically show the superiority of convex regularization, w.r.t. representative baselines, on well-known RL problems across several Atari games.Item Curriculum reinforcement learning via constrained optimal transport(PMLR, 2022) Klink, Pascal; Yang, Haoyi; D'Eramo, Carlo; Pajarinen, Joni; Peters, Jan; Department of Electrical Engineering and Automation; Robot Learning; Technische Universität DarmstadtCurriculum reinforcement learning (CRL) allows solving complex tasks by generating a tailored sequence of learning tasks, starting from easy ones and subsequently increasing their difficulty. Although the potential of curricula in RL has been clearly shown in a variety of works, it is less clear how to generate them for a given learning environment, resulting in a variety of methods aiming to automate this task. In this work, we focus on the idea of framing curricula as interpolations between task distributions, which has previously been shown to be a viable approach to CRL. Identifying key issues of existing methods, we frame the generation of a curriculum as a constrained optimal transport problem between task distributions. Benchmarks show that this way of curriculum generation can improve upon existing CRL methods, yielding high performance in a variety of tasks with different characteristics.Item Establishment of line-of-sight optical links between autonomous underwater vehicles: Field experiment and performance validation(Elsevier BV, 2022-12) Weng, Yang; Matsuda, Takumi; Sekimori, Yuki; Pajarinen, Joni; Peters, Jan; Maki, Toshihiro; Department of Electrical Engineering and Automation; Robot Learning; University of Tokyo; Meiji University; Technische Universität DarmstadtEstablishing a line-of-sight link between autonomous underwater vehicles (AUVs) is an unavoidable challenge for realizing high data rate optical communication in ocean exploration. We propose a method for link establishment by maintaining the relative position and orientation between AUVs. Using a reinforcement learning algorithm, we search for the policy that can suppress external disturbances and optimize the link establishment efficiency. To evaluate the performance of the proposed method, we prepared a hovering AUV to conduct the link establishment experiments. The reinforcement learning policy trained in a simulation environment was deployed on the AUV in real environments. In field experiments, our approach successfully performed the link establishment from the hovering AUV to an autonomous surface vehicle. Based on the experimental results, we evaluate the performance of the AUV in executing the link establishment policy. Comparisons with existing optical search-based link establishment methods are presented.Item Latent Derivative Bayesian Last Layer Networks(2021) Watson, Joe; Lin, Jihao Andreas; Klink, Pascal; Pajarinen, Joni; Peters, Jan; Department of Electrical Engineering and Automation; Banerjee, A; Fukumizu, K; Robot Learning; Technische Universität DarmstadtBayesian neural networks (BNN) are powerful parametric models for nonlinear regression with uncertainty quantification. However, the approximate inference techniques for weight space priors suffer from several drawbacks. The 'Bayesian last layer' (BLL) is an alternative BNN approach that learns the feature space for an exact Bayesian linear model with explicit predictive distributions. However, its predictions outside of the data distribution (OOD) are typically overconfident, as the marginal likelihood objective results in a learned feature space that overfits to the data. We overcome this weakness by introducing a functional prior on the model's derivatives w.r.t. the inputs. Treating these Jacobians as latent variables, we incorporate the prior into the objective to influence the smoothness and diversity of the features, which enables greater predictive uncertainty. For the BLL, the Jacobians can be computed directly using forward mode automatic differentiation, and the distribution over Jacobians may be obtained in closed-form. We demonstrate this method enhances the BLL to Gaussian process-like performance on tasks where calibrated uncertainty is critical: OOD regression, Bayesian optimization and active learning, which include high-dimensional real-world datasets.Item Long-Term Visitation Value for Deep Exploration in Sparse-Reward Reinforcement Learning(MDPI AG, 2022-02-28) Parisi, Simone; Tateo, Davide; Hensel, Maximilian; D’eramo, Carlo; Peters, Jan; Pajarinen, Joni; Department of Electrical Engineering and Automation; Robot Learning; Meta AI Research; Technische Universität DarmstadtReinforcement learning with sparse rewards is still an open challenge. Classic methods rely on getting feedback via extrinsic rewards to train the agent, and in situations where this occurs very rarely the agent learns slowly or cannot learn at all. Similarly, if the agent receives also rewards that create suboptimal modes of the objective function, it will likely prematurely stop exploring. More recent methods add auxiliary intrinsic rewards to encourage exploration. However, auxiliary rewards lead to a non-stationary target for the Q-function. In this paper, we present a novel approach that (1) plans exploration actions far into the future by using a long-term visitation count, and (2) decouples exploration and exploitation by learning a separate function assessing the exploration value of the actions. Contrary to existing methods that use models of reward and dynamics, our approach is off-policy and model-free. We further propose new tabular environments for benchmarking exploration in reinforcement learning. Empirical results on classic and novel benchmarks show that the proposed approach outperforms existing methods in environments with sparse rewards, especially in the presence of rewards that create suboptimal modes of the objective function. Results also suggest that our approach scales gracefully with the size of the environment.Item Model-Based Reinforcement Learning via Stochastic Hybrid Models(IEEE, 2023) Abdulsamad, Hany; Peters, Jan; Department of Electrical Engineering and Automation; Sensor Informatics and Medical Technology; Technische Universität DarmstadtOptimal control of general nonlinear systems is a central challenge in automation. Enabled by powerful function approximators, data-driven approaches to control have recently successfully tackled challenging applications. However, such methods often obscure the structure of dynamics and control behind black-box over-parameterized representations, thus limiting our ability to understand closed-loop behavior. This paper adopts a hybrid-system view of nonlinear modeling and control that lends an explicit hierarchical structure to the problem and breaks down complex dynamics into simpler localized units. We consider a sequence modeling paradigm that captures the temporal structure of the data and derive an expectation-maximization (EM) algorithm that automatically decomposes nonlinear dynamics into stochastic piecewise affine models with nonlinear transition boundaries. Furthermore, we show that these time-series models naturally admit a closed-loop extension that we use to extract local polynomial feedback controllers from nonlinear experts via behavioral cloning. Finally, we introduce a novel hybrid relative entropy policy search (Hb-REPS) technique that incorporates the hierarchical nature of hybrid models and optimizes a set of time-invariant piecewise feedback controllers derived from a piecewise polynomial approximation of a global state-value function.Item Pointing Error Control of Underwater Wireless Optical Communication on Mobile Platform(IEEE, 2022-07-01) Weng, Yang; Matsuda, Takumi; Sekimori, Yuki; Pajarinen, Joni; Peters, Jan; Maki, Toshihiro; Department of Electrical Engineering and Automation; Robot Learning; University of Tokyo; Technische Universität DarmstadtThis letter discusses pointing errors in underwater optical communication caused by environmental disturbances and uncertainties that cannot be well measured and controlled in previous optical alignment methods. The bore-sight and jitter effects are identified in the motion model of the mobile platform. We propose to use the sensor suite that includes a pressure sensor, super short baseline (SSBL) acoustic system, Doppler velocity log (DVL), and fiber optic gyro (FOG) to observe and estimate pointing errors during communication. The pointing errors updated by the particle filter can be shared with the pointing, acquisition, and tracking (PAT) system and thruster system. The sea experiments reveal that the proposed method can measure pointing errors and limit error growth by maneuvering the mobile platform.Item A probabilistic interpretation of self-paced learning with applications to reinforcement learning(MICROTOME PUBL, 2021-07-01) Klink, Pascal; Abdulsamad, Hany; Belousov, Boris; D'Eramo, Carlo; Peters, Jan; Pajarinen, Joni; Department of Electrical Engineering and Automation; Robot Learning; Technische Universität DarmstadtAcross machine learning, the use of curricula has shown strong empirical potential to improve learning from data by avoiding local optima of training objectives. For reinforcement learning (RL), curricula are especially interesting, as the underlying optimization has a strong tendency to get stuck in local optima due to the exploration-exploitation trade-off. Recently, a number of approaches for an automatic generation of curricula for RL have been shown to increase performance while requiring less expert knowledge compared to manually designed curricula. However, these approaches are seldomly investigated from a theoretical perspective, preventing a deeper understanding of their mechanics. In this paper, we present an approach for automated curriculum generation in RL with a clear theoretical underpinning. More precisely, we formalize the well-known self-paced learning paradigm as inducing a distribution over training tasks, which trades off between task complexity and the objective to match a desired task distribution. Experiments show that training on this induced distribution helps to avoid poor local optima across RL algorithms in different tasks with uninformative rewards and challenging exploration requirements.Item Reinforcement Learning Based Underwater Wireless Optical Communication Alignment for Autonomous Underwater Vehicles(IEEE, 2022-10) Weng, Yang; Pajarinen, Joni; Akrour, Riad; Matsuda, Takumi; Peters, Jan; Maki, Toshihiro; Department of Electrical Engineering and Automation; Robot Learning; University of Tokyo; Technische Universität DarmstadtWith the developments in underwater wireless optical communication (UWOC) technology, UWOC can be used in conjunction with autonomous underwater vehicles (AUVs) for high-speed data sharing among the vehicle formation during underwater exploration. A beam alignment problem arises during communication due to the transmission range, external disturbances and noise, and uncertainties in the AUV dynamic model. In this article, we propose an acoustic navigation method to guide the alignment process without requiring beam directors, light intensity sensors, and/or scanning algorithms as used in previous research. The AUVs need stably maintain a specific relative position and orientation for establishing an optical link. We model the alignment problem as a partially observable Markov decision process (POMDP) that takes manipulation, navigation, and energy consumption of underwater vehicles into account. However, finding an efficient policy for the POMDP under high partial observability and environmental variability is challenging. Therefore, for successful policy optimization, we utilize the soft actor–critic reinforcement learning algorithm together with AUV-specific belief updates and reward shaping based curriculum learning. Our approach outperformed baseline approaches in a simulation environment and successfully performed the beam alignment process from one AUV to another on the real AUV Tri-TON 2.Item Sim-To-Real Transfer for Underwater Wireless Optical Communication Alignment Policy between AUVs(2022-05-19) Weng, Yang; Matsuda, Takumi; Sekimori, Yuki; Pajarinen, Joni; Peters, Jan; Maki, Toshihiro; Department of Electrical Engineering and Automation; Robot Learning; University of Tokyo; Technische Universität DarmstadtThe underwater wireless optical communication (UWOC) technology provides a potential high data rate solution for information sharing between multiple autonomous underwater vehicles (AUVs). In order to deploy the UWOC system on mobile platforms, we propose to solve the optical beam alignment problem by maintaining the relative position and orientation of two AUVs. A reinforcement learning based alignment policy is transferred to the real world since it outperforms other baseline approaches and shows good performance in the simulation environment. We randomize the simulator and introduce the disturbances, aiming to cover the real distribution of the underwater environment. Soft actor-critic (SAC) algorithm, reward shaping based curriculum learning, and specifications of the vehicles are utilized to achieve the successful transfer. In the Hiratsuka sea experiments, the alignment policy was deployed on the AUV Tri-TON and successfully aligned with autonomous surface vehicle BUTTORI. It demonstrates a solution for combining the UWOC technology and AUVs team in the ocean investigation.Item Time Synchronization Scheme of Underwater Platforms Using Wireless Acoustic and Optical Communication(2022) Weng, Yang; Matsuda, Takumi; Sekimori, Yuki; Pajarinen, Joni; Peters, Jan; Maki, Toshihiro; Department of Electrical Engineering and Automation; Robot Learning; University of Tokyo; Meiji University; Technische Universität DarmstadtTime synchronization in autonomous underwater vehicle (AUV) formations is significant for joint underwater survey tasks. Maintaining a common time scale can improve the efficiency of cooperative localization, formation control, and data fusion. Instead of using atomic clocks to limit the offset and drift of time, we propose an acoustic and optical cooperative method to synchronize the clocks. Acoustic communication is used to guide the establishment of the optical link and to share the states of the AUVs, while optical communication is used to measure the time difference between the clocks of the two AUVs. The field experiments demonstrated that our proposed method can perform time synchronization in real scenarios.