Multi-omics representation learning using sparse variational autoencoders

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Science | Master's thesis

Department

Mcode

Language

en

Pages

52

Series

Abstract

Multi-omics integration is a key strategy in understanding complex biological systems, especially in cancer research, where data from multiple molecular layers can provide complementary insights into disease mechanisms. However, effective integration is challenged by high dimensionality, data heterogeneity, and the need for interpretability. This thesis addresses these challenges by developing interpretable, scalable, and probabilistic models based on Sparse Variational Autoencoders (VAEs) for unsupervised representation learning on multi-omics data. We first optimize the Sparse VAE architecture by vectorizing its decoder to improve computational efficiency. We then propose two novel models: MOCSS-SparseVAE, which learns sparse feature-factor mappings for modality-specific representations, and Mo-PoE-SparseVAE, which integrates sparse feature-factor mappings with a Product-of-Experts posterior and a gating mechanism for cross-omics interpretability. Experimental results on TCGA BRCA and KIRC datasets demonstrate that the proposed models achieve competitive or superior performance in subtype clustering and classification tasks while providing biologically interpretable latent features. Our findings show that sparsity and probabilistic modeling can be jointly leveraged to enhance both the performance and interpretability of deep multi-omics integration methods.

Description

Supervisor

Marttinen, Pekka

Thesis advisor

Safinianaini, Negar

Other note

Citation