Survival Modeling Using Factor Analysis Data Integration.

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Ask about the availability of the thesis by sending email to the Aalto University Learning Centre oppimiskeskus@aalto.fi
Date
2015-11-05
Department
Major/Subject
Computational Systems Biology
Mcode
IL3013
Degree programme
Master's Degree Programme in Computational and Systems Biology (euSYSBIO)
Language
en
Pages
66 + 9
Series
Abstract
Biology proves that complex diseases are a result of an interplay of genetics and environmental factors. This study aims to combine both by integrating `multi-omics' data with clinical data, thus helping biological and medical researchers in the process of disease diagnosis, patient stratification, disease mechanism analysis and effective treatment decisions. Multi-view biological data from a cohort from National Institute for Health and Welfare (THL), Finland, has been explored using factor models. Factor models reduce high-dimensional data into lower-dimensional factor space. Factor analysis (FA) is the simplest factor model that represents each data feature as a weighted sum of latent factors, separating noise. Bayesian multi-view group-sparse factor analysis (GFA) is another factor model that has been examined in this study. GFA is an extension of FA with sparsity added to the model. GFA is applied on high-dimensional data where features can be naturally divided into different groups (views). Unlike FA, GFA can record component (latent factor) activity for views (groups of related features), this makes GFA a well-suited model for multi-view data sets. Survival models have been utilized to make cardiovascular disease (CVD) risk predictions based on the dependencies between the multiple views as represented by factor models. Cox proportional hazard model is applied to analyze data until a CVD risk event occurs and the output variable is time. This study will provide a stepping stone for exploring GFA, in combination with Cox survival model, for a better latent factor representation of multi-view data sets.
Description
Supervisor
Kaski, Samuel
Thesis advisor
Marttinen, Pekka
Keywords
Bayesian data analysis, factor models, survival models, risk predictions, cardiovascular disease
Other note
Citation