Mode-constrained Model-based Reinforcement Learning via Gaussian Processes

Loading...
Thumbnail Image

Access rights

openAccess

URL

Journal Title

Journal ISSN

Volume Title

A4 Artikkeli konferenssijulkaisussa

Date

2023

Major/Subject

Mcode

Degree programme

Language

en

Pages

16
3299-3314

Series

Proceedings of Machine Learning Research, Volume 206

Abstract

Model-based reinforcement learning (RL) algorithms do not typically consider environments with multiple dynamic modes, where it is beneficial to avoid inoperable or undesirable modes. We present a model-based RL algorithm that constrains training to a single dynamic mode with high probability. This is a difficult problem because the mode constraint is a hidden variable associated with the environment's dynamics. As such, it is 1) unknown a priori and 2) we do not observe its output from the environment, so cannot learn it with supervised learning. We present a nonparametric dynamic model which learns the mode constraint alongside the dynamic modes. Importantly, it learns latent structure that our planning scheme leverages to 1) enforce the mode constraint with high probability, and 2) escape local optima induced by the mode constraint. We validate our method by showing that it can solve a simulated quadcopter navigation task whilst providing a level of constraint satisfaction both during and after training.

Description

Funding Information: We thank ST John, Martin Trapp, Arno Solin, and Paul Chang for valuable discussions and feedback. This work was conducted whilst Aidan Scannell was a PhD student at the EPSRC Centre for Doctoral Training in Future Autonomous and Robotic Systems (FARSCOPE) at the Bristol Robotics Laboratory. It was finished whilst funded by the Finnish Center for Artificial Intelligence (FCAI). Publisher Copyright: Copyright © 2023 by the author(s)

Keywords

Other note

Citation

Scannell, A, Ek, C H & Richards, A 2023, ' Mode-constrained Model-based Reinforcement Learning via Gaussian Processes ', Proceedings of Machine Learning Research, vol. 206, pp. 3299-3314 . < https://proceedings.mlr.press/v206/scannell23a/scannell23a.pdf >