Scalable communication for high-order stencil computations using CUDA-aware MPI
Loading...
Access rights
openAccess
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
2022-07
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
12
Series
PARALLEL COMPUTING, Volume 111, pp. 1-12
Abstract
Modern compute nodes in high-performance computing provide a tremendous level of parallelism and processing power. However, as arithmetic performance has been observed to increase at a faster rate relative to memory and network bandwidths, optimizing data movement has become critical for achieving strong scaling in many communication-heavy applications. This performance gap has been further accentuated with the introduction of graphics processing units, which can provide by multiple factors higher throughput in data-parallel tasks than central processing units. In this work, we explore the computational aspects of iterative stencil loops and implement a generic communication scheme using CUDA-aware MPI, which we use to accelerate magnetohydrodynamics simulations based on high-order finite differences and third-order Runge–Kutta integration. We put particular focus on improving intra-node locality of workloads. Our GPU implementation scales strongly from one to 64 devices at 50%–87% of the expected efficiency based on a theoretical performance model. Compared with a multi-core CPU solver, our implementation exhibits 20–60× speedup and 9–12× improved energy efficiency in compute-bound benchmarks on 16 nodes.Description
| openaire: EC/H2020/818665/EU//UniSDyn Funding Information: This work was supported by the Academy of Finland ReSoLVE Centre of Excellence (grant number 307411 ); the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Project UniSDyn, grant agreement n:o 818665 ); and CHARMS within ASIAA from Academia Sinica. Publisher Copyright: © 2022 The Authors
Keywords
High-performance computing, Graphics processing units, Stencil computations, Computational physics, Magnetohydrodynamics
Other note
Citation
Pekkilä, J, Väisälä, M S, Käpylä, M J, Rheinhardt, M & Lappi, O 2022, ' Scalable communication for high-order stencil computations using CUDA-aware MPI ', PARALLEL COMPUTING, vol. 111, 102904, pp. 1-12 . https://doi.org/10.1016/j.parco.2022.102904