Machine Learning Assisted Dynamic Scheduling for Energy Efficient Serverless Cloud Workloads

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Science | Master's thesis

Date

2024-12-25

Department

Major/Subject

Security and Cloud Computing

Mcode

Degree programme

Master's Programme in Security and Cloud Computing

Language

en

Pages

98

Series

Abstract

The growing energy demands of cloud data centers have raised concerns about the sustainability of cloud computing. Serverless cloud computing, which is based on dynamic resource allocation, has the potential to reduce the cloud energy footprint by turning off idle resources. However, serverless technologies could still be further optimized for more energy-efficient workload placement and scheduling. By effectively scheduling and consolidating workloads onto fewer nodes until the CPU starts to saturate, unused resources can remain idle or be placed in power-saving modes, resulting in additional energy savings. This scheduling challenge resembles a bin-packing problem, where Kubernetes worker nodes act as "bins" and workloads as "items" to be allocated, but is further complicated because some workloads can have unknown resource demands (i.e., the "dimensions" of the items) before deployment. To address this, we propose a reinforcement learning (RL)-based scheduling approach that iteratively optimizes workload placement through trial-and-error learning. Specifically, we develop a Deep Q-learning (DQN) model to maximize CPU utilization on active nodes while minimizing overall cluster power consumption. The proposed approach is evaluated using synthetic Knative-based serverless workloads, assuming unknown resource requirements for workloads, and compared against baseline scheduling techniques, including Random, Round Robin, Best Fit CPU < 80%, and the Default Kubernetes Scheduler. In addition to introducing a novel RL-based scheduler, this work integrates the DQN model into the Kubernetes scheduler and provides comprehensive performance evaluations. Results demonstrate that the RL-based scheduler can outperform baseline methods by consolidating workloads onto fewer nodes and can reduce energy consumption. We further observe that RL-based scheduler incurs a tradeoff between energy and performance. Our findings could serve as a foundation for optimizing energy efficiency in hyperscale and edge cloud environments though real-world validation remains future work.

Description

Supervisor

Jung, Alex

Thesis advisor

Morabito, Roberto
Komu, Miika

Keywords

serverless, scheduler, energy awareness, deep reinforcement learning, deep q-learning, knative

Other note

Citation