Learning Centre

Comparative Analysis of Big Data Stream Processing Systems

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.advisor Latif, Khalid
dc.contributor.author Salem, Farouk
dc.date.accessioned 2016-08-26T09:01:59Z
dc.date.available 2016-08-26T09:01:59Z
dc.date.issued 2016-07-29
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/21577
dc.description.abstract In recent years, Big Data has become a prominent paradigm in the field of distributed systems. These systems distribute data storage and processing power across a cluster of computers. Such systems need methodologies to store and process Big Data in a distributed manner. There are two models for Big Data processing: batch processing and stream processing. The batch processing model is able to produce accurate results but with large latency. Many systems, such as billing systems, require Big Data to be processed with low latency because of real-time constraints. Therefore, the batch processing model is unable to fulfill the requirements of real-time systems. The stream processing model tries to address the batch processing limitations by producing results with low latency. Unlike the batch processing model, the stream processing model processes the recent data instead of all the produced data to fulfill the time limitations of real-time systems. The subsequent model divides a stream of records into data windows. Each data window contains a group of records to be processed together. Records can be collected based on the time of arrival, the time of creation, or the user sessions. However, in some systems, processing the recent data depends on the already processed data. There are many frameworks that try to process Big Data in real time such as Apache Spark, Apache Flink, and Apache Beam. The main purpose of this research is to give a clear and fair comparison among the mentioned frameworks from different perspectives such as the latency, processing guarantees, the accuracy of results, fault tolerance, and the available functionalities of each framework. en
dc.format.extent 12+77
dc.format.mimetype application/pdf en
dc.language.iso en en
dc.title Comparative Analysis of Big Data Stream Processing Systems en
dc.type G2 Pro gradu, diplomityö fi
dc.contributor.school Perustieteiden korkeakoulu fi
dc.subject.keyword big data en
dc.subject.keyword stream processing frameworks en
dc.subject.keyword Apache Spark en
dc.subject.keyword apache flink en
dc.subject.keyword apache beam en
dc.subject.keyword lambda architecture en
dc.identifier.urn URN:NBN:fi:aalto-201608263033
dc.programme.major Mobile Computing, Services and Security fi
dc.programme.mcode SCI3045 fi
dc.type.ontasot Master's thesis en
dc.type.ontasot Diplomityö fi
dc.contributor.supervisor Heljanko, Keijo
dc.programme Master's Programme in ICT Innovation fi
local.aalto.openaccess yes
dc.rights.accesslevel openAccess
local.aalto.idinssi 54245
dc.type.publication masterThesis
dc.type.okm G2 Pro gradu, diplomityö


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

Statistics