Interaction Detection with Probabilistic Deep Learning for Genetics

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Science | Doctoral thesis (article-based) | Defence date: 2023-05-19
Degree programme
56 + app. 118
Aalto University publication series DOCTORAL THESES, 48/2023
Deep learning is an important machine learning tool in genetics because of its ability to model nonlinear relations between genotypes and phenotypes, such as genetic interactions, without any assumptions about the forms of relations. However, current deep learning approaches are restricted in genetics applications by (i) the lack of well-calibrated uncertainty estimation about the model and (ii) limited accessible individual-level data for model training. This thesis aims to design principled approaches to tackle the shortcomings of deep learning with two relevant statistical genetics applications: gene-gene interaction detection and genotype-phenotype prediction. First, we focus on interaction detection with deep learning. We provide calibrated uncertainty estimations to interaction detection in deep learning with Bayesian principles, which are used to control statistical errors, e.g., false positive rate and false negative rate, of detected interactions. In genetic interaction detection applications, we design a novel neural network architecture to increase the power of detecting complex gene-gene interactions by learning gene representations that aggregate information from all SNPs (single-nucleotide polymorphisms) of the genes being analyzed and considering complex interaction forms between them beyond only the currently considered multiplicative interactions. Moreover, we propose a new permutation procedure that gives calibrated null distributions of genetic interactions from the neural network. Second, we study deep learning models in the low-data regime. We improve deep learning prediction by incorporating domain knowledge with informative priors. Specifically, we design informative Gaussian scale mixture priors that explicitly encode prior beliefs about feature sparsity and data signal-to-noise ratio into deep learning models, which improve their accuracy on regression tasks, such as genotype-phenotype prediction, especially when only a small training set is available. Moreover, we study how to understand better the working mechanism of low-data deep learning models that share knowledge from multiple similar domains, such as transfer learning, with representation similarity. We find that current representation similarities of deep learning models on multiple domains give counter-intuitive conclusions about their functional similarities due to the confounding effect of the input data structure. Therefore, we introduce a deconfounding step to adjust for the confounder, which improves the consistency of representation similarities w.r.t. functional similarities of models.
Supervising professor
Kaski, Samuel, Prof., Aalto University, Department of Computer Science, Finland
Thesis advisor
Marttinen, Pekka, Prof., Aalto University, Department of Computer Science, Finland
probabilistic methods, deep learning, interaction detection, statistical genetics
Other note
  • [Publication 1]: Tianyu Cui, Pekka Marttinen, and Samuel Kaski. Learning Global Pairwise Interactions with Bayesian Neural Networks. In 24th European Conference on Artificial Intelligence (ECAI), August 2020.
    Full text in Acris/Aaltodoc:
    DOI: 10.3233/FAIA200205 View at publisher
  • [Publication 2]: Tianyu Cui, Khaoula El Mekkaoui, Jaakko Reinvall, Aki Havulinna, Pekka Marttinen, and Samuel Kaski. Gene-Gene Interaction Detection with Deep Learning. Communications Biology, November 2022.
    Full text in Acris/Aaltodoc:
    DOI: 10.1038/s42003-022-04186-y View at publisher
  • [Publication 3]: Tianyu Cui, Aki Havulinna, Pekka Marttinen, and Samuel Kaski. Informative Bayesian Neural Network Priors for Weak Signals. Bayesian Analysis, September 2021.
    Full text in Acris/Aaltodoc:
    DOI: 10.1214/21-BA1291 View at publisher
  • [Publication 4]: Tianyu Cui, Yogesh Kumar, Pekka Marttinen, and Samuel Kaski. Deconfounded Representation Similarity for Comparison of Neural Networks. In 36th Conference on Neural Information Processing Systems (NeurIPS), New Orleans, November 2022.
    DOI: 10.48550/arXiv.2202.00095 View at publisher