AutoML: Comparing performance with human-designed solutions in Kaggle competitions
Loading...
Journal Title
Journal ISSN
Volume Title
School of Business |
Master's thesis
Author
Date
2024
Department
Major/Subject
Mcode
Degree programme
Information and Service Management (ISM)
Language
en
Pages
45+5
Series
Abstract
The adoption of Machine Learning (ML) has been a vital point of interest for organizations globally, but its adoption has been slowed down by high costs related to expert personnel and computational power. However, as high computational power has become cheaper and more available, a solution is emerging that solves the need for technical skills required of ML experts: AutoML. They are tools that aim to automate the ML pipeline in a way that domain experts can also start to develop their own predictive models thus further democratizing ML. This paper surveys different techniques used to automate the pipeline and compares results gained by using a newly released AutoML tool against human-designed solutions by utilizing Kaggle competitions. The results are also benchmarked against other frameworks based on the study by Erickson et al. (2020). Furthermore, it proposes a theoretical framework that can be used to assess an ML task’s difficultness while testing AutoML tools. The research consisted of taking part in 10 relatively recent competitions that had a large number of submissions and included binary classification, regression, and multiclass classification ML tasks. Based on the results, the utilized AutoML tool was on average better than a third of the human competitors. The research implicated that having a larger dataset, relatively more numerical features, and the task being binary classification had a negative impact on the framework’s performance. Compared to the other 6 frameworks, it had below average results. To summarise, using only AutoML tools to create a model is fast but it comes at a notable cost to its performance.Description
Thesis advisor
Malo, PekkaKeywords
machine learning, automl, benchmark, kaggle, qlik automl