Advancing rail mobility: Generating robust train paths with deep reinforcement learning

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Insinööritieteiden korkeakoulu | Master's thesis

Date

2024-08-19

Department

Major/Subject

Sustainable Urban Mobility Transitions

Mcode

ENG3085

Degree programme

Master’s programme in Urban Mobility

Language

en

Pages

74+7

Series

Abstract

The search for methods generating robust rail path in the timetable has been a focus area in academic research and anticipates potential use in industry. It is a foremost requirement to generate train paths in timetable which are robust inherently and are not affected by minor delays. During the tactical planning stages, it is difficult to predict the future delays to incorporate required supplement times – which are the additional times added to provide a buffer when a delay is experienced in operation. In practical, creating a timetable is still a manual process with minor usage of microscopic level tools for eliminating conflicts. In research there are several many optimization techniques proposed and tested to generate robust train paths and have an optimized overall timetable. In recent years the implementation of reinforcement learning (RL) techniques to schedule or reschedule trains in a timetable has increased due to its ability of finding feasible results by heuristics. This opens a fascinating area of research to test and evaluate the performance of RL models and pave a path towards implementation in real world scenarios. In this thesis, a deep reinforcement learning model is developed to test and evaluate the performance of different deep reinforcement learning (DRL) agents. Initially, a reinforcement learning environment is developed based on the railway timetabling problem. The selection of parameters, constraints and function provides a deeper insight in the significance of these elements in the final performance of the agent. In this environment, two DRL agents, Soft Actor Critic (SAC) and Trust Region Policy Optimization (TRPO) are trained under several constraint conditions and the performance is evaluated. The better performing agent is then tested in the environment to find feasible and optimal train paths. The evaluation is performed under varying conditions of delay and disturbances. Two experiments are conducted in this thesis based on a case study area, which is part of Swedish Rail Network from Sala (Sl) to Västerås Central station (Vc). The results from the evaluation based on the simulation of agent exploiting the learned policy in environment show that the method can generate feasible train paths in a timetable under delay conditions.

Description

Supervisor

Roncoli, Cludio

Thesis advisor

Lindbergh, Jakob
Högdahl, Johan

Keywords

railways, reinforcement learning, timetable optimization, train path generation, robust timetabling

Other note

Citation