Performances and trade-offs between real-time and micro-batch distributed stream processing systems in stateful and stateless processing
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Science |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
Department
Major/Subject
Mcode
Language
en
Pages
72
Series
Abstract
There are numerous stream processing frameworks available on the market. Each of them is suitable for a specific set of use cases and constraints. Selecting an inappropriate tool could introduce significant technical debt to the overall system. Therefore, a thorough understanding of the stream processing frameworks and their trade-offs is essential for developing a reliable, scalable, and efficient real-time data processing pipeline. Given the aforementioned context, this thesis aims to: (1) identify some factors that could affect the performance of modern real-time and micro-batch distributed stream processing (DSP) systems within various setups; (2) understand the impact of ingestion load on latency of modern real-time and micro-batch DSP systems; and (3) examine the scalability of modern real-time and micro-batch DSP systems under ingestion load bursts. The thesis finds that there are multiple factors affecting the performance of stream processing systems such as state management, data partitioning techniques, and ingestion load. The experiment shows that an increase in ingestion load could negatively impact latency of a stream processing system, or even crash it. Additionally, while Apache Flink (a native real-time stream processing system) outperforms Apache Spark (a micro-batch stream processing system) in terms of latencies for a certain range of ingestion loads, Apache Spark proves to scale more reliably under circumstances where significant and unpredicted ingestion load bursts are a possibility.Description
Supervisor
Zhao, BoThesis advisor
Rajasekharan, RamyaShi, Tuo