Risk Estimation Using Offline Reinforcement Learning in the Football Domain

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Sähkötekniikan korkeakoulu | Master's thesis

Date

2022-08-22

Department

Major/Subject

Autonomous Systems

Mcode

ELEC3055

Degree programme

Master's Programme in ICT Innovation

Language

en

Pages

94+0

Series

Abstract

Nowadays, an interest in Machine Learning has grown in many sectors. Complex challenges have been solved using Reinforcement Learning (RL) implementations. However, due to its exploratory nature, RL suffers from data inefficiency and cannot guarantee safety in many complex tasks. In this thesis, a novel approach called OfSaCRE is proposed, aiming to reduce the number of constraint violations an agent commits in deployment. First, an Offline Safety Critic, encoding the risk estimation, is obtained using a dataset from past transitions and Offline RL techniques. Then, the Offline Safety Critic is deployed together with an RL agent using a safety control module, which decides the final action to be taken based on the estimated safety of each action. In addition, an alternative training architecture to enable the usage of OfSaCRE during learning is explored, penalizing the usage of the safety critic in the RL agent. In football, statistics confirm that teams with more ball possession have a better chance to win the match. Therefore, not losing the ball is of utmost importance. This thesis measures the effects of OfSaCRE in the football domain, where in this work the constraint is defined as losing the ball. The performance using different Offline RL algorithms and the addition of noise in the used datasets is analyzed. The results showed that Offline DQN and noisy dataset are the most adequate algorithm and dataset type for this application, since they reduce the number of constraint violations without punishing too much the reward exploitation. In the training architecture, the results indicated that the number of constraint violations is reduced by more than a half but at the cost of not learning any useful behaviour to exploit the reward.

Description

Supervisor

Kyrki, Ville

Thesis advisor

Dzibela, Daniel

Keywords

reinforcement learning, safe reinforcement learning, offline reinforcement learning, risk estimation, football simulation, safety critic

Other note

Citation