Latent Gaussian processes with composite likelihoods for data-driven disease stratification

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorKoskinen, Miika
dc.contributor.authorRamchandran, Siddharth
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.supervisorLähdesmäki, Harri
dc.date.accessioned2019-08-25T15:13:05Z
dc.date.available2019-08-25T15:13:05Z
dc.date.issued2019-08-19
dc.description.abstractMachine learning has caused a seismic shift on how clinical patient data is being used and interpreted. It can be harnessed for more effective and efficient healthcare that can benefit both patients and medical practitioners through personalised health solutions. Disease stratification is an important task in personalised medicine and has the potential to help medical researchers better understand diseases. In collaboration with the Helsinki Biobank and the Helsinki University Hospital, we aim to better understand clinical patient records comprising of multiple likelihoods (with noisy and missing values) by embedding these high-dimensional observations in to a low-dimensional space while capturing the similarity between the observations. In this thesis, we propose an unsupervised, generative model that can identify this latent clustering among patients while making use of all available data (i.e., in a heterogeneous data setting). We make use of deep neural networks and Gaussian process latent variable models (GPLVM) to create a form of non-linear dimensionality reduction for heterogeneous data. The key principle in our model is to use the output of latent GPs (sparse GPs) to modulate the parameters of the different likelihoods through link functions. The intractability introduced by the composite likelihoods is overcome by making use of sampling-based variational inference with quadrature. We make use of deep neural networks to parameterise the variational inference to introduce a constraint that balances between locality and dissimilarity preservation in the latent space. We demonstrated the effectiveness of our model on toy datasets and clinical data of Parkinson's disease patients treated at the HUS Helsinki University Hospital. Our approach identifies sub-groups from the heterogeneous patient data and we evaluated the differences in characteristics among the identified clusters using standard statistical tests.en
dc.format.extent71
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/39904
dc.identifier.urnURN:NBN:fi:aalto-201908254965
dc.language.isoenen
dc.programmeMaster’s Programme in Computer, Communication and Information Sciencesfi
dc.programme.majorMachine Learning and Data Mining (Macadamia)fi
dc.programme.majorTietotekniikkafi
dc.programme.mcodeSCI3042fi
dc.subject.keywordGPLVMen
dc.subject.keywordsparse GPsen
dc.subject.keywordneural networken
dc.subject.keywordvariational inferenceen
dc.subject.keywordpersonalised medicineen
dc.titleLatent Gaussian processes with composite likelihoods for data-driven disease stratificationen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
local.aalto.electroniconlyyes
local.aalto.openaccessyes

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
master_Ramchandran_Siddharth_2019.pdf
Size:
12.33 MB
Format:
Adobe Portable Document Format