Advanced Clustering Algorithms for System Abnormality Detection

No Thumbnail Available

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2019-10-21

Department

Major/Subject

Embedded Systems

Mcode

SCI3024

Degree programme

Master's Programme in ICT Innovation

Language

en

Pages

58

Series

Abstract

Identifying the root cause of an error in software testing is a demanding task. It becomes even harder in continuous integration environments as the errors could occur due to bugs in the previous builds. Often organizations deploy automated testing pipeline in a continuous integration software development and dedicate a team of experts to identify the error category and issue tickets to the respective teams for bug fixing. The research problem in the scope of this thesis is to find indigenous solutions for system abnormality detection using natural language processing and machine learning. Underlying patterns in the error messages are to be observed to categorize the error messages to different clusters to assist the software testers to find the root cause of the error. A model-based research approach where text data(error message) clustering experiments are performed using different clustering algorithms and results were analyzed using various evaluation metrics to suggest the best clustering algorithm in the scope of the thesis. The thesis investigates three different clustering algorithms and finds out that K-Means clustering performs best in the specific use case of text data clustering for system abnormality detection from error messages in system logs.

Description

Supervisor

Hirvisalo, Vesa

Thesis advisor

Lu, Zhengwu

Keywords

clustering, machine learning, continuous integration, natural language pocessing, error messages

Other note

Citation