Robust Proximal Policy Optimization for Reinforcement Learning
dc.contributor | Aalto-yliopisto | fi |
dc.contributor | Aalto University | en |
dc.contributor.advisor | Babadi, Amin | |
dc.contributor.advisor | Zhao, Yi | |
dc.contributor.author | Moazzeni Bikani, Pooya | |
dc.contributor.school | Sähkötekniikan korkeakoulu | fi |
dc.contributor.supervisor | Pajarinen, Joni | |
dc.date.accessioned | 2022-10-23T17:06:19Z | |
dc.date.available | 2022-10-23T17:06:19Z | |
dc.date.issued | 2022-10-17 | |
dc.description.abstract | Reinforcement learning is a family of machine learning algorithms, in which the system learns to make sequential optimal decisions by interacting with the environment. Reinforcement learning problems are modelled by the Markov Decision Process, which is identified by its transition probability and reward function. Most of the reinforcement algorithms are designed under the assumption that the transition probability and reward function do not vary over time. However, this is not inline with the real-world targets, as the environment is subject to change. This will impose more challenges for the system (agent) to learn the optimal policy and act accordingly. This scenario is known as non-stationary reinforcement learning, where the characteristics of the environment changes from design to deployment and over time. This work begins by providing a review of policy gradient methods that exploit function approximation and are suitable for large state and action space problems. Then, a robust algorithm based on Proximal Policy Optimization (PPO) actor-critic algorithm is proposed to address the non-stationary reinforcement learning problem. This algorithm is tested on various reinforcement learning simulation environments and compared with several baselines including PPO. | en |
dc.format.extent | 55+5 | |
dc.format.mimetype | en | |
dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/117373 | |
dc.identifier.urn | URN:NBN:fi:aalto-202210236159 | |
dc.language.iso | en | en |
dc.location | P1 | fi |
dc.programme | CCIS - Master’s Programme in Computer, Communication and Information Sciences (TS2013) | fi |
dc.programme.major | Communications Engineering | fi |
dc.programme.mcode | ELEC3029 | fi |
dc.subject.keyword | reinforcement learning | en |
dc.subject.keyword | non-stationary environment | en |
dc.subject.keyword | proximal policy optimization | en |
dc.subject.keyword | trust region policy optimization | en |
dc.subject.keyword | robust proximal policy optimization | en |
dc.title | Robust Proximal Policy Optimization for Reinforcement Learning | en |
dc.type | G2 Pro gradu, diplomityö | fi |
dc.type.ontasot | Master's thesis | en |
dc.type.ontasot | Diplomityö | fi |
local.aalto.electroniconly | yes | |
local.aalto.openaccess | no |