Latent Gaussian processes with composite likelihoods for data-driven disease stratification
dc.contributor | Aalto-yliopisto | fi |
dc.contributor | Aalto University | en |
dc.contributor.advisor | Koskinen, Miika | |
dc.contributor.author | Ramchandran, Siddharth | |
dc.contributor.school | Perustieteiden korkeakoulu | fi |
dc.contributor.supervisor | Lähdesmäki, Harri | |
dc.date.accessioned | 2019-08-25T15:13:05Z | |
dc.date.available | 2019-08-25T15:13:05Z | |
dc.date.issued | 2019-08-19 | |
dc.description.abstract | Machine learning has caused a seismic shift on how clinical patient data is being used and interpreted. It can be harnessed for more effective and efficient healthcare that can benefit both patients and medical practitioners through personalised health solutions. Disease stratification is an important task in personalised medicine and has the potential to help medical researchers better understand diseases. In collaboration with the Helsinki Biobank and the Helsinki University Hospital, we aim to better understand clinical patient records comprising of multiple likelihoods (with noisy and missing values) by embedding these high-dimensional observations in to a low-dimensional space while capturing the similarity between the observations. In this thesis, we propose an unsupervised, generative model that can identify this latent clustering among patients while making use of all available data (i.e., in a heterogeneous data setting). We make use of deep neural networks and Gaussian process latent variable models (GPLVM) to create a form of non-linear dimensionality reduction for heterogeneous data. The key principle in our model is to use the output of latent GPs (sparse GPs) to modulate the parameters of the different likelihoods through link functions. The intractability introduced by the composite likelihoods is overcome by making use of sampling-based variational inference with quadrature. We make use of deep neural networks to parameterise the variational inference to introduce a constraint that balances between locality and dissimilarity preservation in the latent space. We demonstrated the effectiveness of our model on toy datasets and clinical data of Parkinson's disease patients treated at the HUS Helsinki University Hospital. Our approach identifies sub-groups from the heterogeneous patient data and we evaluated the differences in characteristics among the identified clusters using standard statistical tests. | en |
dc.format.extent | 71 | |
dc.format.mimetype | application/pdf | en |
dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/39904 | |
dc.identifier.urn | URN:NBN:fi:aalto-201908254965 | |
dc.language.iso | en | en |
dc.programme | Master’s Programme in Computer, Communication and Information Sciences | fi |
dc.programme.major | Machine Learning and Data Mining (Macadamia) | fi |
dc.programme.major | Tietotekniikka | fi |
dc.programme.mcode | SCI3042 | fi |
dc.subject.keyword | GPLVM | en |
dc.subject.keyword | sparse GPs | en |
dc.subject.keyword | neural network | en |
dc.subject.keyword | variational inference | en |
dc.subject.keyword | personalised medicine | en |
dc.title | Latent Gaussian processes with composite likelihoods for data-driven disease stratification | en |
dc.type | G2 Pro gradu, diplomityö | fi |
dc.type.ontasot | Master's thesis | en |
dc.type.ontasot | Diplomityö | fi |
local.aalto.electroniconly | yes | |
local.aalto.openaccess | yes |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- master_Ramchandran_Siddharth_2019.pdf
- Size:
- 12.33 MB
- Format:
- Adobe Portable Document Format