Cache Policy Design via Reinforcement Learning for Cellular Networks in Non-Stationary Environment
Loading...
Access rights
openAccess
acceptedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
2023-10-23
Major/Subject
Mcode
Degree programme
Language
en
Pages
6
Series
2023 IEEE International Conference on Communications Workshops: Sustainable Communications for Renaissance, ICC Workshops 2023, pp. 764-769, IEEE International Conference on Communications workshops
Abstract
We consider wireless caching both at the network edge and at User Equipment (UE) to alleviate traffic congestion, aiming to find a joint cache placement and delivery policy by maximizing the Quality of Service (QoS) while minimizing backhaul load and User Equipment (UE) power consumption. We assume unknown and time-variant file popularities which are affected by the UE cache content, leading to a non-stationary Partial Observable Markov Decision Process (POMDP). We address this problem in a deep reinforcement learning framework, employing Feed Forward Neural Network (FFNN) and Long Short Term Memory (LSTM) networks in conjunction with Advantageous Actor Critic (A2C) algorithm. LSTM exploits the correlation of the file popularity distribution across time slots to learn information of the dynamics of the environment and A2C algorithm is used due to its ability of handling continuous and high dimensional spaces. We leverage LSTM and A2C tools based on its virtue to find an optimal solution for the POMDP environment. Simulation results show that using LSTM-based A2C outperforms a FFNN-based A2C in terms of sample efficiency and optimality. An LSTM-based A2C gives a superior performance under the non-stationary POMDP paradigm.Description
Keywords
Other note
Citation
Srinivasan, A, Amidzade, M, Zhang, J & Tirkkonen, O 2023, Cache Policy Design via Reinforcement Learning for Cellular Networks in Non-Stationary Environment . in 2023 IEEE International Conference on Communications Workshops : Sustainable Communications for Renaissance, ICC Workshops 2023 . IEEE International Conference on Communications workshops, IEEE, pp. 764-769, IEEE International Conference on Communications Workshops, Rome, Italy, 28/05/2023 . https://doi.org/10.1109/ICCWorkshops57953.2023.10283680