Performance Analysis of Cloud-Based Stream Processing Pipelines for Real-Time Vehicle Data

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2019-08-19

Department

Major/Subject

Cloud Computing and Services

Mcode

SCI3081

Degree programme

Master's Programme in ICT Innovation

Language

en

Pages

62

Series

Abstract

The recent advancements in stream processing systems enabled applications to exploit fast-changing data and provide real-time services to companies and users. This kind of application requires high throughput and low latency to provide the most value. This thesis work, in collaboration with Scania, provides fundamental blocks for the efficient development of latency-optimized, cloud-based, real-time processing pipelines. With investigation and analysis of the real-time Scania pipeline, this thesis delivers three contributions, that can be employed to speed up the process of developing, testing and optimizing low-latency streaming pipelines in many different contexts. The first contribution is the design and implementation of a generic framework for testing and benchmarking AWS based streaming pipelines. This framework allows collecting latency statistics from every step of the pipeline. The insights it produces can be used to quickly identify bottlenecks of the pipeline. Employing this framework, the study then proceeds to analyze the behaviour of Scania serverless streaming pipeline, which is AWS Kinesis and AWS Lambda services. The results show the importance of optimizing configuration parameters such as memory size and batch size. Several suggestions of best configurations and optimization of the pipeline are discussed. Finally, the thesis offers a survey of the main alternatives to Scania pipeline, including Apache Spark Streaming and Apache Flink. With an analysis of the benefits and drawbacks of each framework, We choose Flink as an alternative solution. Scania pipeline is adapted to Flink with new design and implementation. Benefits of Flink pipeline and performance comparison are discussed in detail. Overall, this work can be used as an extensive guide to the design and implementation of efficient, low-latency pipelines to be deployed on the cloud.

Description

Supervisor

Ylä-Jääski, Antti

Thesis advisor

Xu, Cheng

Keywords

stream processing, AWS, connected vehicles, data preprocessing

Other note

Citation