Advanced Clustering Algorithms for System Abnormality Detection
No Thumbnail Available
URL
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu |
Master's thesis
Authors
Date
2019-10-21
Department
Major/Subject
Embedded Systems
Mcode
SCI3024
Degree programme
Master's Programme in ICT Innovation
Language
en
Pages
58
Series
Abstract
Identifying the root cause of an error in software testing is a demanding task. It becomes even harder in continuous integration environments as the errors could occur due to bugs in the previous builds. Often organizations deploy automated testing pipeline in a continuous integration software development and dedicate a team of experts to identify the error category and issue tickets to the respective teams for bug fixing. The research problem in the scope of this thesis is to find indigenous solutions for system abnormality detection using natural language processing and machine learning. Underlying patterns in the error messages are to be observed to categorize the error messages to different clusters to assist the software testers to find the root cause of the error. A model-based research approach where text data(error message) clustering experiments are performed using different clustering algorithms and results were analyzed using various evaluation metrics to suggest the best clustering algorithm in the scope of the thesis. The thesis investigates three different clustering algorithms and finds out that K-Means clustering performs best in the specific use case of text data clustering for system abnormality detection from error messages in system logs.Description
Supervisor
Hirvisalo, VesaThesis advisor
Lu, ZhengwuKeywords
clustering, machine learning, continuous integration, natural language pocessing, error messages