Increasing Fault Tolerance in MPI Stencil-Based Simulations

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Insinööritieteiden korkeakoulu | Bachelor's thesis

Department

Mcode

ENG3045

Language

en

Pages

20+7

Series

Abstract

This thesis investigates fault-tolerance strategies in a high-performance computing environment focusing on their applicability in stencil-based applications. The strategies are demonstrated with a model application simulating heat propagation. Through a literature review it was discovered that approaches that rely on checkpointing are applicable for stencil-based applications. After mathematical modelling and empirical measurements, it was found that diskless checkpointing does not provide a significant advantage in this scenario over checkpointing with disk due to relatively small checkpoint size. Furthermore, the developed mathematical model, which predicts the expected running time for a given fault tolerance technique, can be utilized to assess the applicability of these techniques in scenarios beyond the scope of this thesis.

Description

Supervisor

St-Pierre, Luc

Thesis advisor

Puro, Touko

Other note

Citation