Latent Gaussian processes with composite likelihoods for data-driven disease stratification

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Date
2019-08-19
Department
Major/Subject
Machine Learning and Data Mining (Macadamia)
Tietotekniikka
Mcode
SCI3042
Degree programme
Master’s Programme in Computer, Communication and Information Sciences
Language
en
Pages
71
Series
Abstract
Machine learning has caused a seismic shift on how clinical patient data is being used and interpreted. It can be harnessed for more effective and efficient healthcare that can benefit both patients and medical practitioners through personalised health solutions. Disease stratification is an important task in personalised medicine and has the potential to help medical researchers better understand diseases. In collaboration with the Helsinki Biobank and the Helsinki University Hospital, we aim to better understand clinical patient records comprising of multiple likelihoods (with noisy and missing values) by embedding these high-dimensional observations in to a low-dimensional space while capturing the similarity between the observations. In this thesis, we propose an unsupervised, generative model that can identify this latent clustering among patients while making use of all available data (i.e., in a heterogeneous data setting). We make use of deep neural networks and Gaussian process latent variable models (GPLVM) to create a form of non-linear dimensionality reduction for heterogeneous data. The key principle in our model is to use the output of latent GPs (sparse GPs) to modulate the parameters of the different likelihoods through link functions. The intractability introduced by the composite likelihoods is overcome by making use of sampling-based variational inference with quadrature. We make use of deep neural networks to parameterise the variational inference to introduce a constraint that balances between locality and dissimilarity preservation in the latent space. We demonstrated the effectiveness of our model on toy datasets and clinical data of Parkinson's disease patients treated at the HUS Helsinki University Hospital. Our approach identifies sub-groups from the heterogeneous patient data and we evaluated the differences in characteristics among the identified clusters using standard statistical tests.
Description
Supervisor
Lähdesmäki, Harri
Thesis advisor
Koskinen, Miika
Keywords
GPLVM, sparse GPs, neural network, variational inference, personalised medicine
Other note
Citation