Anomaly detection for Linux system log

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Department

Mcode

SCI3020

Language

en

Pages

46+9

Series

Abstract

The goal of this study is to develop effective methods for detecting anomalies in Linux Syslog collected during CI/CD deployment. The automatic detection will help improve developers' efficiency of debugging by saving much time that is spent on manually searching for errors in the sea of logs. For this purpose, two different types of anomaly detection methods are evaluated, namely workflow-based method and PCA-based method. During the experiment, different Natural language processing (NLP) methods such as word2vec and TF-IDF are tested for preprocessing and encoding the log message body. Long short-term memory (LSTM) and Principal component analysis (PCA) models are implemented separately as the representatives for the two types of methods mentioned above. The experiment results of both methods turn out to surpass the performance of the baseline method stupid backoff, which is the current solution used by the thesis sponsor company. LSTM and PCA both reach a relatively balanced performance of recall and precision. As a harmonic indicator, the F1 score for PCA reaches 0.9043 and, for LSTM it is 0.9124, while the baseline is 0.6411. In the conclusion section, different suitable use cases of different methods are discussed. These two methods proposed in this thesis contributes towards detecting syslog anomalies in an unsupervised manner when no label is provided.

Description

Supervisor

Nieminen, Mika

Thesis advisor

Koivistoinen, Ossi

Other note

Citation