Cross-systems multi-level data pipelines optimization for predicting sunspot emergence

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2024-06-17

Department

Major/Subject

Computer Science

Mcode

SCI3042

Degree programme

Master’s Programme in Computer, Communication and Information Sciences

Language

en

Pages

87

Series

Abstract

The proliferation of big data pipelines has spurred collaborative efforts across multiple disciplines to explore the intricacies of those domains. One notable collaboration involves the synergy between Computer Science and other natural sciences. Researchers in diverse domains possess valuable insights that can significantly enhance the extraction of novel and impactful findings within their respective fields. However, the optimal utilization of these pipelines often requires harnessing the full potential of High Performance Computing (HPC) systems. A significant challenge arises from the fact that these pipelines are optimized for scientific accuracy, and therefore may fail to exploit the available resources to their maximum capacity. To address this issue, this thesis explores various approaches to separate the scientific development on the pipeline by the domain scientist from the HPC resource optimization by the computer scientist, and to capture the runtime conditions of processes, identify potential imbalances, and elucidate their underlying causes. The concept is exemplified by applying it to a pipeline proposed by Korpi-Lagg et al. [1]. We conduct a statistical analysis of this pipeline, and investigate existing imbalances and areas for optimization within the pipelines. Through these efforts, the thesis aims to contribute to the enhancement of big data pipeline efficiency and effectiveness across diverse domains. [1] M. J. Korpi-Lagg, A. Korpi-Lagg, N. Olspert, and H. L. Truong, “Solarcycle variation of quiet-Sun magnetism and surface gravity oscillation mode,” Astronomy & Astrophysics, vol. 665, p. A141, Sep. 2022.

Description

Supervisor

Truong, Hong-Linh

Thesis advisor

Korpi-Lagg, Andreas

Keywords

big data, pipelines, high performance computing, observability, monitoring, optimization

Other note

Citation