Machine Learning Assisted Dynamic Scheduling for Energy Efficient Serverless Cloud Workloads
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Science |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2024-12-25
Department
Major/Subject
Security and Cloud Computing
Mcode
Degree programme
Master's Programme in Security and Cloud Computing
Language
en
Pages
98
Series
Abstract
The growing energy demands of cloud data centers have raised concerns about the sustainability of cloud computing. Serverless cloud computing, which is based on dynamic resource allocation, has the potential to reduce the cloud energy footprint by turning off idle resources. However, serverless technologies could still be further optimized for more energy-efficient workload placement and scheduling. By effectively scheduling and consolidating workloads onto fewer nodes until the CPU starts to saturate, unused resources can remain idle or be placed in power-saving modes, resulting in additional energy savings. This scheduling challenge resembles a bin-packing problem, where Kubernetes worker nodes act as "bins" and workloads as "items" to be allocated, but is further complicated because some workloads can have unknown resource demands (i.e., the "dimensions" of the items) before deployment. To address this, we propose a reinforcement learning (RL)-based scheduling approach that iteratively optimizes workload placement through trial-and-error learning. Specifically, we develop a Deep Q-learning (DQN) model to maximize CPU utilization on active nodes while minimizing overall cluster power consumption. The proposed approach is evaluated using synthetic Knative-based serverless workloads, assuming unknown resource requirements for workloads, and compared against baseline scheduling techniques, including Random, Round Robin, Best Fit CPU < 80%, and the Default Kubernetes Scheduler. In addition to introducing a novel RL-based scheduler, this work integrates the DQN model into the Kubernetes scheduler and provides comprehensive performance evaluations. Results demonstrate that the RL-based scheduler can outperform baseline methods by consolidating workloads onto fewer nodes and can reduce energy consumption. We further observe that RL-based scheduler incurs a tradeoff between energy and performance. Our findings could serve as a foundation for optimizing energy efficiency in hyperscale and edge cloud environments though real-world validation remains future work.Description
Supervisor
Jung, AlexThesis advisor
Morabito, RobertoKomu, Miika
Keywords
serverless, scheduler, energy awareness, deep reinforcement learning, deep q-learning, knative