Towards the evaluation of congested clique compatibility on modern computers

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Science | Master's thesis

Department

Major/Subject

Mcode

Language

en

Pages

53

Series

Abstract

This thesis implements and evaluates the performance of Congested Clique algorithms on modern computers. The Congested Clique model is a theoretical framework for distributed computing, where fully connected computers exchange messages in synchronous rounds, assuming unlimited local computation and focusing on minimizing the number of communication rounds. MST_loglogn is a Congested Clique algorithm that solves the Minimum Spanning Tree (MST) problem in 𝑂(loglogn) rounds. An MST of a connected, weighted graph is a subset of edges connecting all vertices without cycles and with minimum total weight. This study aims to evaluate the compatibility of the Congested Clique model with modern computers, focusing on the implementation and practical performance of MST_loglogn using Open MPI for CPUs and CUDA for GPUs. When it comes to implementation complexity, the MPI version is relatively straightforward due to collective communication primitives, while the CUDA implementation is more complex, requiring explicit global memory management and careful synchronization. One of the main contributions is that the round complexity of MST_loglogn is confirmed to be 𝑂(loglogn) in both MPI and CUDA implementations, but actual runtime is dominated by graph size and data movement, rather than the round count. Early phases take significantly longer; optimizing these phases improves total runtime. Going deeper into the time composition of MST_loglogn, contrary to the model’s assumptions, local computation time is significant in both MPI and CUDA. The sequential part of the algorithm tends to be very heavy, especially in CUDA, as the graph grows. However, both implementations outperform Prim’s algorithm given enough parallelism, which shows the practical benefits of the Congested Clique model. Both implementations demonstrate the ability to process large graphs, but both are memory-intensive, with large message buffers limiting the maximum graph size.

Description

Supervisor

Suomela, Jukka

Thesis advisor

Vahidi, Hossein

Other note

Citation