Learning-based classification of software logs generated by a test automation framework
Loading...
URL
Journal Title
Journal ISSN
Volume Title
Sähkötekniikan korkeakoulu |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2022-03-21
Department
Major/Subject
Control, Robotics and Autonomous Systems
Mcode
ELEC3025
Degree programme
AEE - Master’s Programme in Automation and Electrical Engineering (TS2013)
Language
en
Pages
89+4
Series
Abstract
Managing large software development systems has become increasingly challenging, as large volumes of raw data generated by the production telemetry are intractable for manual processing. The client of this thesis seeks an effective scalable approach to tackle this issue by automatically classifying the software logs generated in case of integration test failures during software production. This thesis has developed two machine learning candidate solutions to demonstrate the feasibility of a learning-based approach for log classification. The first solution represents a canonical natural language processing pipeline, which performs step-by-step transformation of the input data using text preprocessing and numerical representation methods as well as permits using any traditional machine learning model for classification. The second solution employs the transfer learning approach and a deep neural language model from the family of bidirectional transformers, which incorporates an encoder for contextual text representation that is fine-tuned on a domain-specific corpus to improve classification performance. Both solutions achieved high accuracy scores, thus confirming the feasibility of a learning-based approach for software log classification. Experiments showed that contextual text representations using no text preprocessing contributed more to classification accuracy than other representation schemes attempted in this work. A transformer neural language model pre-trained on the general natural language domain successfully adapted to the domain of software logs with minimal preprocessing effort. At the same time, the experimental results indicated that careful vocabulary management and methodical log preprocessing could enhance similarity between the domains and thus further improve the classification accuracy of the transfer learning solution.Description
Supervisor
Solin, ArnoThesis advisor
Pilzer, AndreaKujala, Mikko
Keywords
artificial intelligence, machine learning, natural language processing, transfer learning, continuous integration, test automation