Risk Estimation Using Offline Reinforcement Learning in the Football Domain

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorDzibela, Daniel
dc.contributor.authorAznar Alvarez, Raul
dc.contributor.schoolSähkötekniikan korkeakoulufi
dc.contributor.supervisorKyrki, Ville
dc.date.accessioned2022-09-04T17:01:03Z
dc.date.available2022-09-04T17:01:03Z
dc.date.issued2022-08-22
dc.description.abstractNowadays, an interest in Machine Learning has grown in many sectors. Complex challenges have been solved using Reinforcement Learning (RL) implementations. However, due to its exploratory nature, RL suffers from data inefficiency and cannot guarantee safety in many complex tasks. In this thesis, a novel approach called OfSaCRE is proposed, aiming to reduce the number of constraint violations an agent commits in deployment. First, an Offline Safety Critic, encoding the risk estimation, is obtained using a dataset from past transitions and Offline RL techniques. Then, the Offline Safety Critic is deployed together with an RL agent using a safety control module, which decides the final action to be taken based on the estimated safety of each action. In addition, an alternative training architecture to enable the usage of OfSaCRE during learning is explored, penalizing the usage of the safety critic in the RL agent. In football, statistics confirm that teams with more ball possession have a better chance to win the match. Therefore, not losing the ball is of utmost importance. This thesis measures the effects of OfSaCRE in the football domain, where in this work the constraint is defined as losing the ball. The performance using different Offline RL algorithms and the addition of noise in the used datasets is analyzed. The results showed that Offline DQN and noisy dataset are the most adequate algorithm and dataset type for this application, since they reduce the number of constraint violations without punishing too much the reward exploitation. In the training architecture, the results indicated that the number of constraint violations is reduced by more than a half but at the cost of not learning any useful behaviour to exploit the reward.en
dc.format.extent94+0
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/116485
dc.identifier.urnURN:NBN:fi:aalto-202209045296
dc.language.isoenen
dc.locationP1fi
dc.programmeMaster's Programme in ICT Innovationfi
dc.programme.majorAutonomous Systemsfi
dc.programme.mcodeELEC3055fi
dc.subject.keywordreinforcement learningen
dc.subject.keywordsafe reinforcement learningen
dc.subject.keywordoffline reinforcement learningen
dc.subject.keywordrisk estimationen
dc.subject.keywordfootball simulationen
dc.subject.keywordsafety criticen
dc.titleRisk Estimation Using Offline Reinforcement Learning in the Football Domainen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
local.aalto.electroniconlyyes
local.aalto.openaccessyes

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
master_Aznar_Alvarez_Raul_2022.pdf
Size:
7.29 MB
Format:
Adobe Portable Document Format