Exploring Safe Reinforcement Learning Using Safety Shields Derived With System-Theoretic Process Analysis: A Case-Study on a Cruise Ship Hotel System
Loading...
Access rights
openAccess
acceptedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Date
Major/Subject
Mcode
Degree programme
Language
en
Pages
4
Series
2025 IEEE 30th International Conference on Emerging Technologies and Factory Automation (ETFA), IEEE International Conference on Emerging Technologies and Factory Automation
Abstract
The cruise ship industry is under increasing pressure to reduce greenhouse gas emissions, as international regulations define ambitious requirements and goals for modern cruise ships. One of the most significant consumers of energy onboard cruise ships are their Heating Ventilation and Air Conditioning (HVAC) systems. However, the energy optimization of HVAC systems is challenging, as they are impacted by a number of uncontrolled variables, such as changing weather conditions, passenger behavior, and the demands of other significant energy consumers, such as propulsion systems. Reinforcement Learning (RL) is often used to tackle such complex optimization tasks, however concerns over ensuring the safety of RL optimized systems hinders its adoption in industry, especially in the context of safety-critical systems. This paper presents the initial findings of applying a novel approach to ensure safety in RL: a safety shield developed utilizing a novel hazard analysis method, System-Theoretic Process Analysis. In this work the safety shield is used to both train the RL agent as well as block unsafe behavior in operation. Preliminary findings suggest that blocking unsafe behavior during training hinders the ability to learn a safe RL policy, however, when used in testing the approach is capable of significantly reducing the number of safety violations.Description
Keywords
Other note
Citation
King, A, Shahinas, E, Atmojo, U D & Vyatkin, V 2025, Exploring Safe Reinforcement Learning Using Safety Shields Derived With System-Theoretic Process Analysis: A Case-Study on a Cruise Ship Hotel System. in L Almeida, M Indria, M de Sousa, A Visioli, M Ashjaei & P Santos (eds), 2025 IEEE 30th International Conference on Emerging Technologies and Factory Automation (ETFA). IEEE International Conference on Emerging Technologies and Factory Automation, IEEE, IEEE International Conference on Emerging Technologies and Factory Automation, Porto, Portugal, 09/09/2025. https://doi.org/10.1109/etfa65518.2025.11205793