Resilient Machine Learning on Stream Processing Engines

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis
Ask about the availability of the thesis by sending email to the Aalto University Learning Centre oppimiskeskus@aalto.fi

Department

Mcode

SCI3021

Language

en

Pages

61+6

Series

Abstract

The continuously increasing volume of data has had a huge impact on information systems and businesses. With the forthcoming of internet of things, the amount of information available will be even greater. Gartner is expecting a 30-fold increase in devices and sensors by 2020 and foresees the emergence of new business models that take advantage of real-time streaming data from these devices. New kind of technological tools, called Stream Processing Engines (SPEs), have risen to facilitate the processing of large-scale data streams. The real-time handling of information, however, introduces unique challenges in terms of resiliency and fault tolerance that affect the implementations as well as the operation of such solutions. Our main contributions are threefold. First, a survey about the impacts these new technologies have on the operations side is presented. Next, three distinct alternative implementations for resilient online machine learning applications are proposed. Our focus is on finding a solution to handle shared state in an SPE that gains its fault tolerance through linear, deterministic workflows. Finally, a complementary analysis about the integration tests that were conducted on a cluster of servers is given.

Description

Supervisor

Nurminen, Jukka

Thesis advisor

Kunft, Andreas

Other note

Citation