Prioritizing Fault Tolerance: A Framework for Building Resilient Distributed Systems in Private Clouds

dc.contributorAalto Universityen
dc.contributor.advisorKhettab, Yacine
dc.contributor.authorAlkhafaji, Husamu-Aldeen
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.supervisorSiekkinen, Matti
dc.description.abstractThis thesis explores the design, implementation, and evaluation of fault-tolerant distributed systems in private cloud environments. With the increasing demand for reliable and resilient systems in cloud computing, it becomes crucial to address the unique challenges posed by private clouds. Through a comprehensive literature review, the key challenges and requirements for fault tolerance in distributed systems are identified. Existing fault tolerance mechanisms are analyzed, and a tailored architecture is proposed specifically for private cloud environments. This architecture integrates redundancy and replication to ensure the continuous availability and reliability of critical services. Furthermore, the research investigates different perspectives to rank components in a system, prioritizing fault tolerance based on their centrality and influence on the overall architecture. A weighted sum approach is introduced to allow flexibility and adaptability in ranking components, based on specific requirements and contextual factors. The proposed architecture is implemented, and failure tests are conducted using chaos engineering principles. The results provide insights into the trade-offs between optimizing fault tolerance and not implementing it at all. By selectively applying fault tolerance measures to critical services, optimal fault tolerance levels are achieved while minimizing resource and allocation overheads. The findings emphasize the importance of considering the specific characteristics and requirements of private cloud environments during fault-tolerant distributed system design.en
dc.programmeMaster’s Programme in Security and Cloud Computing (SECCLO)fi
dc.programme.majorSecurity and Cloud Computing (SECCLO, Erasmus Mundus)fi
dc.subject.keywordDistributed Systemsen
dc.subject.keywordFault Toleranceen
dc.subject.keywordSoftware Architectureen
dc.subject.keywordCloud Computingen
dc.titlePrioritizing Fault Tolerance: A Framework for Building Resilient Distributed Systems in Private Cloudsen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen