Resilient Machine Learning on Stream Processing Engines

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Ask about the availability of the thesis by sending email to the Aalto University Learning Centre oppimiskeskus@aalto.fi
Date
2015-02-09
Department
Major/Subject
Distributed Systems and Services
Mcode
SCI3021
Degree programme
Master's Programme in ICT Innovation
Language
en
Pages
61+6
Series
Abstract
The continuously increasing volume of data has had a huge impact on information systems and businesses. With the forthcoming of internet of things, the amount of information available will be even greater. Gartner is expecting a 30-fold increase in devices and sensors by 2020 and foresees the emergence of new business models that take advantage of real-time streaming data from these devices. New kind of technological tools, called Stream Processing Engines (SPEs), have risen to facilitate the processing of large-scale data streams. The real-time handling of information, however, introduces unique challenges in terms of resiliency and fault tolerance that affect the implementations as well as the operation of such solutions. Our main contributions are threefold. First, a survey about the impacts these new technologies have on the operations side is presented. Next, three distinct alternative implementations for resilient online machine learning applications are proposed. Our focus is on finding a solution to handle shared state in an SPE that gains its fault tolerance through linear, deterministic workflows. Finally, a complementary analysis about the integration tests that were conducted on a cluster of servers is given.
Description
Supervisor
Nurminen, Jukka
Thesis advisor
Kunft, Andreas
Keywords
resilience, learning, stream, fault-tolerance
Other note
Citation