Latent Gaussian processes with composite likelihoods for data-driven disease stratification

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Department

Mcode

SCI3042

Language

en

Pages

71

Series

Abstract

Machine learning has caused a seismic shift on how clinical patient data is being used and interpreted. It can be harnessed for more effective and efficient healthcare that can benefit both patients and medical practitioners through personalised health solutions. Disease stratification is an important task in personalised medicine and has the potential to help medical researchers better understand diseases. In collaboration with the Helsinki Biobank and the Helsinki University Hospital, we aim to better understand clinical patient records comprising of multiple likelihoods (with noisy and missing values) by embedding these high-dimensional observations in to a low-dimensional space while capturing the similarity between the observations. In this thesis, we propose an unsupervised, generative model that can identify this latent clustering among patients while making use of all available data (i.e., in a heterogeneous data setting). We make use of deep neural networks and Gaussian process latent variable models (GPLVM) to create a form of non-linear dimensionality reduction for heterogeneous data. The key principle in our model is to use the output of latent GPs (sparse GPs) to modulate the parameters of the different likelihoods through link functions. The intractability introduced by the composite likelihoods is overcome by making use of sampling-based variational inference with quadrature. We make use of deep neural networks to parameterise the variational inference to introduce a constraint that balances between locality and dissimilarity preservation in the latent space. We demonstrated the effectiveness of our model on toy datasets and clinical data of Parkinson's disease patients treated at the HUS Helsinki University Hospital. Our approach identifies sub-groups from the heterogeneous patient data and we evaluated the differences in characteristics among the identified clusters using standard statistical tests.

Description

Supervisor

Lähdesmäki, Harri

Thesis advisor

Koskinen, Miika

Other note

Citation