Prioritizing Fault Tolerance: A Framework for Building Resilient Distributed Systems in Private Clouds

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2023-08-21

Department

Major/Subject

Security and Cloud Computing (SECCLO, Erasmus Mundus)

Mcode

SCI3113

Degree programme

Master’s Programme in Security and Cloud Computing (SECCLO)

Language

en

Pages

74

Series

Abstract

This thesis explores the design, implementation, and evaluation of fault-tolerant distributed systems in private cloud environments. With the increasing demand for reliable and resilient systems in cloud computing, it becomes crucial to address the unique challenges posed by private clouds. Through a comprehensive literature review, the key challenges and requirements for fault tolerance in distributed systems are identified. Existing fault tolerance mechanisms are analyzed, and a tailored architecture is proposed specifically for private cloud environments. This architecture integrates redundancy and replication to ensure the continuous availability and reliability of critical services. Furthermore, the research investigates different perspectives to rank components in a system, prioritizing fault tolerance based on their centrality and influence on the overall architecture. A weighted sum approach is introduced to allow flexibility and adaptability in ranking components, based on specific requirements and contextual factors. The proposed architecture is implemented, and failure tests are conducted using chaos engineering principles. The results provide insights into the trade-offs between optimizing fault tolerance and not implementing it at all. By selectively applying fault tolerance measures to critical services, optimal fault tolerance levels are achieved while minimizing resource and allocation overheads. The findings emphasize the importance of considering the specific characteristics and requirements of private cloud environments during fault-tolerant distributed system design.

Description

Supervisor

Siekkinen, Matti

Thesis advisor

Khettab, Yacine

Keywords

Kubernetes, Microservices, Distributed Systems, Fault Tolerance, Software Architecture, Cloud Computing

Other note

Citation