Generalizing Offline Reinforcement Learning to Unseen Dynamics Parameters with Synthetic Data
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2024-12-25
Department
Major/Subject
Cloud and Network Infrastructures
Mcode
Degree programme
Master's Programme in ICT Innovation
Language
en
Pages
39
Series
Abstract
Reinforcement Learning (RL) has achieved remarkable performance in real-world industrial applications like robotics and logistics. RL often struggles with adapting to diverse and changing contexts due to limited training data and poor generalization capabilities. Collecting sufficient real-world data is both costly and time-consuming, which hampers the development of adaptable RL systems. To alleviate this problem, Context-Aware RL algorithms address these issues by incorporating contextual information. However, their ability to generalize to out-of-distribution (OOD) scenarios remains limited. On the other hand, diffusion models, known for their strong generative capabilities, offer a promising solution to enhance RL. In this thesis, we propose a method that leverages diffusion models to improve the sample efficiency and generalization ability of RL agents. We collect real data from online RL agents training and we train diffusion models on the real data with varying contexts. We use specific contexts to guide the diffusion model in generating transitions. We use these synthetic transitions to train offline RL agents, enabling them to perform effectively across diverse and unseen environments. Experimental results demonstrate that our method improves RL performance in OOD contexts while maintaining performance within in-distribution scenarios.Description
Supervisor
Pajarinen, JoniThesis advisor
Scannell, AidanShrestha, Jatan
Keywords
reinforcement learning, diffusion models, synthetic experience replay, sample-efficient learning, generalization, dynamic environments