Bayesian Optimization for Partially Overlapping Covariate Data Sources

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Date
2022-06-13
Department
Major/Subject
Machine Learning, Data Science and Artificial Intelligence (Macadamia)
Mcode
SCI3044
Degree programme
Master’s Programme in Computer, Communication and Information Sciences
Language
en
Pages
57
Series
Abstract
One problem in the real-world industrial process is how to utilize diverse information on best practices through different data sources. It becomes more complicated when those best practices are different, but not entirely, from each other. The goal is to find the optimal best practices from those diverse and somewhat different data. That problem has been formulated into finding the optimal parameter settings in diverse, partially overlapping covariate data sources. First, the data from different sources are stacked row-wise to form a master data set with missing data. Then, Bayesian Optimization with Missing Inputs is employed to find the optimal experiment's parameter settings. Different methods of modeling the missing data set are tested, such as Bayesian Non-negative Matrix Factorization (BNMF) and Bayesian Probabilistic Matrix Factorization (BPMF). Both provide a quality representation of the missing data, allowing the Bayesian Optimization algorithm to work. The BPMF-based methods have significantly better performances than the BNMF-based methods. However, BNMF-based methods are helpful in some specific cases due to the structure of the missing data set. Multi-armed Bandit Algorithms are used to tackle the problem of a parameter settings budget constraint in each iteration. The $\epsilon$-greedy and UCB1 have been tested. The $\epsilon$-greedy can occasionally give better results because of its randomness. In contrast, The UCB1 consistently improves its performance through each iteration. This work proposes a framework to utilize the information from partially overlapping data sources to find the parameter settings that yield a maximum return. This work benefits a wide range of real-world industrial production processes and opens exciting research directions.
Description
Supervisor
Kaski, Samuel
Thesis advisor
Gillberg, Jussi
Nguyen, Linh
Keywords
bayesian optimizaion, multi-source, partially overlapping, multi-armed bandits, matrix factorization, acquisition function
Other note
Citation