Latent variable model for high-dimensional point process with structured missingness

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2024-07-31

Department

Major/Subject

Machine Learning, Data Science and Artificial Intelligence

Mcode

SCI3044

Degree programme

Master’s Programme in Mathematics and Operations Research

Language

en

Pages

50 + 3

Series

Abstract

Longitudinal data are important in numerous fields, such as healthcare, sociology and seismology. However, real-world datasets present notable challenges for practitioners because they can be high-dimensional, contain structured missingness patterns, and measurement time points can be governed by an unknown stochastic process. While various solutions have been suggested, the majority of them have been designed to account for only one of these challenges. In this work, we propose a flexible and efficient latent-variable model that addresses all these limitations. Our approach utilizes Gaussian processes to capture temporal correlations between samples and their associated missingness masks as well as to model the underlying point process. To properly account for longitudinal type of data, we adopt a longitudinal Gaussian process kernel to model various interactions between input covariates while modeling both observations and missingness masks. We construct our model as a variational autoencoder together with deep neural network parameterised encoder and decoder models, and develop a scalable amortised variational inference approach for efficient model training. Finally, we demonstrate competitive performance in data imputation and long-term prediction tasks using both simulated and real-world datasets.

Description

Supervisor

Lähdesmäki, Harri

Thesis advisor

Haussmann, Manuel

Keywords

longitudinal data, structured missingness, Gaussian process, variational autoencoder, temporal point process

Other note

Citation