Robust Proximal Policy Optimization for Reinforcement Learning
Loading...
URL
Journal Title
Journal ISSN
Volume Title
Sähkötekniikan korkeakoulu |
Master's thesis
Authors
Date
2022-10-17
Department
Major/Subject
Communications Engineering
Mcode
ELEC3029
Degree programme
CCIS - Master’s Programme in Computer, Communication and Information Sciences (TS2013)
Language
en
Pages
55+5
Series
Abstract
Reinforcement learning is a family of machine learning algorithms, in which the system learns to make sequential optimal decisions by interacting with the environment. Reinforcement learning problems are modelled by the Markov Decision Process, which is identified by its transition probability and reward function. Most of the reinforcement algorithms are designed under the assumption that the transition probability and reward function do not vary over time. However, this is not inline with the real-world targets, as the environment is subject to change. This will impose more challenges for the system (agent) to learn the optimal policy and act accordingly. This scenario is known as non-stationary reinforcement learning, where the characteristics of the environment changes from design to deployment and over time. This work begins by providing a review of policy gradient methods that exploit function approximation and are suitable for large state and action space problems. Then, a robust algorithm based on Proximal Policy Optimization (PPO) actor-critic algorithm is proposed to address the non-stationary reinforcement learning problem. This algorithm is tested on various reinforcement learning simulation environments and compared with several baselines including PPO.Description
Supervisor
Pajarinen, JoniThesis advisor
Babadi, AminZhao, Yi
Keywords
reinforcement learning, non-stationary environment, proximal policy optimization, trust region policy optimization, robust proximal policy optimization