Lightweight massively parallel suffix array construction

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorKaski, Petteri
dc.contributor.authorShukhrov, Yury
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.supervisorKaski, Petteri
dc.date.accessioned2019-06-23T15:14:11Z
dc.date.available2019-06-23T15:14:11Z
dc.date.issued2019-06-17
dc.description.abstractThe suffix array is an array of sorted suffixes in lexicographic order, where each sorted suffix is represented by its starting position in the input string. It is a fundamental data structure that finds various applications in areas such as string processing, text indexing, data compression, computational biology, and many more. Over the last three decades, researchers have proposed a broad spectrum of suffix array construction algorithms (SACAs). However, the majority of SACAs were implemented using sequential and parallel programming models. The maturity of GPU programming opened doors to the development of massively parallel GPU SACAs that outperform the fastest versions of suffix sorting algorithms optimized for the CPU parallel computing. Over the last five years, several GPU SACA approaches were proposed and implemented. They prioritized the running time over lightweight design. In this thesis, we design and implement a lightweight massively parallel SACA on the GPU using the prefix-doubling technique. Our prefix-doubling implementation is memory-efficient and can successfully construct the suffix array for input strings as large as 640 megabytes (MB) on Tesla P100 GPU. On large datasets, our implementation achieves a speedup of 7-16x over the fastest, highly optimized, OpenMP-accelerated suffix array constructor, libdivsufsort, that leverages the CPU shared memory parallelism. The performance of our algorithm relies on several high-performance parallel primitives such as radix sort, conditional filtering, inclusive prefix sum, random memory scattering, and segmented sort. We evaluate the performance of our implementation over a variety of real-world datasets with respect to its runtime, throughput, memory usage, and scalability. We compare our results against libdivsufsort that we run on a Haswell compute node equipped with 24 cores. Our GPU SACA is simple and compact, consisting of less than 300 lines of readable and effective source code. Additionally, we design and implement a fast and lightweight algorithm for checking the correctness of the suffix array.en
dc.format.extent88
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/39017
dc.identifier.urnURN:NBN:fi:aalto-201906234083
dc.language.isoenen
dc.programmeMaster's Programme in Computer, Communication and Information Sciencesfi
dc.programme.majorComputer Sciencefi
dc.programme.mcodeSCI3042fi
dc.subject.keywordsuffix arrayen
dc.subject.keywordGPUen
dc.subject.keywordprefix-doublingen
dc.subject.keywordCUDAen
dc.subject.keywordalgorithm engineeringen
dc.subject.keywordBurrows-Wheeler transformen
dc.titleLightweight massively parallel suffix array constructionen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
local.aalto.electroniconlyyes
local.aalto.openaccessyes

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
master_Shukhrov_Yury_2019.pdf
Size:
1005.07 KB
Format:
Adobe Portable Document Format