Bayesian Optimization for Partially Overlapping Covariate Data Sources

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorGillberg, Jussi
dc.contributor.advisorNguyen, Linh
dc.contributor.authorNguyen, Dan
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.supervisorKaski, Samuel
dc.date.accessioned2022-06-19T17:00:47Z
dc.date.available2022-06-19T17:00:47Z
dc.date.issued2022-06-13
dc.description.abstractOne problem in the real-world industrial process is how to utilize diverse information on best practices through different data sources. It becomes more complicated when those best practices are different, but not entirely, from each other. The goal is to find the optimal best practices from those diverse and somewhat different data. That problem has been formulated into finding the optimal parameter settings in diverse, partially overlapping covariate data sources. First, the data from different sources are stacked row-wise to form a master data set with missing data. Then, Bayesian Optimization with Missing Inputs is employed to find the optimal experiment's parameter settings. Different methods of modeling the missing data set are tested, such as Bayesian Non-negative Matrix Factorization (BNMF) and Bayesian Probabilistic Matrix Factorization (BPMF). Both provide a quality representation of the missing data, allowing the Bayesian Optimization algorithm to work. The BPMF-based methods have significantly better performances than the BNMF-based methods. However, BNMF-based methods are helpful in some specific cases due to the structure of the missing data set. Multi-armed Bandit Algorithms are used to tackle the problem of a parameter settings budget constraint in each iteration. The $\epsilon$-greedy and UCB1 have been tested. The $\epsilon$-greedy can occasionally give better results because of its randomness. In contrast, The UCB1 consistently improves its performance through each iteration. This work proposes a framework to utilize the information from partially overlapping data sources to find the parameter settings that yield a maximum return. This work benefits a wide range of real-world industrial production processes and opens exciting research directions.en
dc.format.extent57
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/115151
dc.identifier.urnURN:NBN:fi:aalto-202206193992
dc.language.isoenen
dc.programmeMaster’s Programme in Computer, Communication and Information Sciencesfi
dc.programme.majorMachine Learning, Data Science and Artificial Intelligence (Macadamia)fi
dc.programme.mcodeSCI3044fi
dc.subject.keywordbayesian optimizaionen
dc.subject.keywordmulti-sourceen
dc.subject.keywordpartially overlappingen
dc.subject.keywordmulti-armed banditsen
dc.subject.keywordmatrix factorizationen
dc.subject.keywordacquisition functionen
dc.titleBayesian Optimization for Partially Overlapping Covariate Data Sourcesen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
local.aalto.electroniconlyyes
local.aalto.openaccessyes
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
master_Nguyen_Dan_2022.pdf
Size:
1.79 MB
Format:
Adobe Portable Document Format