Learning-based classification of software logs generated by a test automation framework

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Sähkötekniikan korkeakoulu | Master's thesis

Date

2022-03-21

Department

Major/Subject

Control, Robotics and Autonomous Systems

Mcode

ELEC3025

Degree programme

AEE - Master’s Programme in Automation and Electrical Engineering (TS2013)

Language

en

Pages

89+4

Series

Abstract

Managing large software development systems has become increasingly challenging, as large volumes of raw data generated by the production telemetry are intractable for manual processing. The client of this thesis seeks an effective scalable approach to tackle this issue by automatically classifying the software logs generated in case of integration test failures during software production. This thesis has developed two machine learning candidate solutions to demonstrate the feasibility of a learning-based approach for log classification. The first solution represents a canonical natural language processing pipeline, which performs step-by-step transformation of the input data using text preprocessing and numerical representation methods as well as permits using any traditional machine learning model for classification. The second solution employs the transfer learning approach and a deep neural language model from the family of bidirectional transformers, which incorporates an encoder for contextual text representation that is fine-tuned on a domain-specific corpus to improve classification performance. Both solutions achieved high accuracy scores, thus confirming the feasibility of a learning-based approach for software log classification. Experiments showed that contextual text representations using no text preprocessing contributed more to classification accuracy than other representation schemes attempted in this work. A transformer neural language model pre-trained on the general natural language domain successfully adapted to the domain of software logs with minimal preprocessing effort. At the same time, the experimental results indicated that careful vocabulary management and methodical log preprocessing could enhance similarity between the domains and thus further improve the classification accuracy of the transfer learning solution.

Description

Supervisor

Solin, Arno

Thesis advisor

Pilzer, Andrea
Kujala, Mikko

Keywords

artificial intelligence, machine learning, natural language processing, transfer learning, continuous integration, test automation

Other note

Citation