Prioritizing Fault Tolerance: A Framework for Building Resilient Distributed Systems in Private Clouds

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Date
2023-08-21
Department
Major/Subject
Security and Cloud Computing (SECCLO, Erasmus Mundus)
Mcode
SCI3113
Degree programme
Master’s Programme in Security and Cloud Computing (SECCLO)
Language
en
Pages
74
Series
Abstract
This thesis explores the design, implementation, and evaluation of fault-tolerant distributed systems in private cloud environments. With the increasing demand for reliable and resilient systems in cloud computing, it becomes crucial to address the unique challenges posed by private clouds. Through a comprehensive literature review, the key challenges and requirements for fault tolerance in distributed systems are identified. Existing fault tolerance mechanisms are analyzed, and a tailored architecture is proposed specifically for private cloud environments. This architecture integrates redundancy and replication to ensure the continuous availability and reliability of critical services. Furthermore, the research investigates different perspectives to rank components in a system, prioritizing fault tolerance based on their centrality and influence on the overall architecture. A weighted sum approach is introduced to allow flexibility and adaptability in ranking components, based on specific requirements and contextual factors. The proposed architecture is implemented, and failure tests are conducted using chaos engineering principles. The results provide insights into the trade-offs between optimizing fault tolerance and not implementing it at all. By selectively applying fault tolerance measures to critical services, optimal fault tolerance levels are achieved while minimizing resource and allocation overheads. The findings emphasize the importance of considering the specific characteristics and requirements of private cloud environments during fault-tolerant distributed system design.
Description
Supervisor
Siekkinen, Matti
Thesis advisor
Khettab, Yacine
Keywords
Kubernetes, Microservices, Distributed Systems, Fault Tolerance, Software Architecture, Cloud Computing
Other note
Citation