Robust Proximal Policy Optimization for Reinforcement Learning

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Sähkötekniikan korkeakoulu | Master's thesis

Date

2022-10-17

Department

Major/Subject

Communications Engineering

Mcode

ELEC3029

Degree programme

CCIS - Master’s Programme in Computer, Communication and Information Sciences (TS2013)

Language

en

Pages

55+5

Series

Abstract

Reinforcement learning is a family of machine learning algorithms, in which the system learns to make sequential optimal decisions by interacting with the environment. Reinforcement learning problems are modelled by the Markov Decision Process, which is identified by its transition probability and reward function. Most of the reinforcement algorithms are designed under the assumption that the transition probability and reward function do not vary over time. However, this is not inline with the real-world targets, as the environment is subject to change. This will impose more challenges for the system (agent) to learn the optimal policy and act accordingly. This scenario is known as non-stationary reinforcement learning, where the characteristics of the environment changes from design to deployment and over time. This work begins by providing a review of policy gradient methods that exploit function approximation and are suitable for large state and action space problems. Then, a robust algorithm based on Proximal Policy Optimization (PPO) actor-critic algorithm is proposed to address the non-stationary reinforcement learning problem. This algorithm is tested on various reinforcement learning simulation environments and compared with several baselines including PPO.

Description

Supervisor

Pajarinen, Joni

Thesis advisor

Babadi, Amin
Zhao, Yi

Keywords

reinforcement learning, non-stationary environment, proximal policy optimization, trust region policy optimization, robust proximal policy optimization

Other note

Citation