Bayesian Optimization for Partially Overlapping Covariate Data Sources
dc.contributor | Aalto-yliopisto | fi |
dc.contributor | Aalto University | en |
dc.contributor.advisor | Gillberg, Jussi | |
dc.contributor.advisor | Nguyen, Linh | |
dc.contributor.author | Nguyen, Dan | |
dc.contributor.school | Perustieteiden korkeakoulu | fi |
dc.contributor.supervisor | Kaski, Samuel | |
dc.date.accessioned | 2022-06-19T17:00:47Z | |
dc.date.available | 2022-06-19T17:00:47Z | |
dc.date.issued | 2022-06-13 | |
dc.description.abstract | One problem in the real-world industrial process is how to utilize diverse information on best practices through different data sources. It becomes more complicated when those best practices are different, but not entirely, from each other. The goal is to find the optimal best practices from those diverse and somewhat different data. That problem has been formulated into finding the optimal parameter settings in diverse, partially overlapping covariate data sources. First, the data from different sources are stacked row-wise to form a master data set with missing data. Then, Bayesian Optimization with Missing Inputs is employed to find the optimal experiment's parameter settings. Different methods of modeling the missing data set are tested, such as Bayesian Non-negative Matrix Factorization (BNMF) and Bayesian Probabilistic Matrix Factorization (BPMF). Both provide a quality representation of the missing data, allowing the Bayesian Optimization algorithm to work. The BPMF-based methods have significantly better performances than the BNMF-based methods. However, BNMF-based methods are helpful in some specific cases due to the structure of the missing data set. Multi-armed Bandit Algorithms are used to tackle the problem of a parameter settings budget constraint in each iteration. The $\epsilon$-greedy and UCB1 have been tested. The $\epsilon$-greedy can occasionally give better results because of its randomness. In contrast, The UCB1 consistently improves its performance through each iteration. This work proposes a framework to utilize the information from partially overlapping data sources to find the parameter settings that yield a maximum return. This work benefits a wide range of real-world industrial production processes and opens exciting research directions. | en |
dc.format.extent | 57 | |
dc.format.mimetype | application/pdf | en |
dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/115151 | |
dc.identifier.urn | URN:NBN:fi:aalto-202206193992 | |
dc.language.iso | en | en |
dc.programme | Master’s Programme in Computer, Communication and Information Sciences | fi |
dc.programme.major | Machine Learning, Data Science and Artificial Intelligence (Macadamia) | fi |
dc.programme.mcode | SCI3044 | fi |
dc.subject.keyword | bayesian optimizaion | en |
dc.subject.keyword | multi-source | en |
dc.subject.keyword | partially overlapping | en |
dc.subject.keyword | multi-armed bandits | en |
dc.subject.keyword | matrix factorization | en |
dc.subject.keyword | acquisition function | en |
dc.title | Bayesian Optimization for Partially Overlapping Covariate Data Sources | en |
dc.type | G2 Pro gradu, diplomityö | fi |
dc.type.ontasot | Master's thesis | en |
dc.type.ontasot | Diplomityö | fi |
local.aalto.electroniconly | yes | |
local.aalto.openaccess | yes |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- master_Nguyen_Dan_2022.pdf
- Size:
- 1.79 MB
- Format:
- Adobe Portable Document Format