Advancing towards personalized medicine: probabilistic machine learning and deep learning for health and genetics

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorWharrie, Sophie
dc.contributor.departmentTietotekniikan laitosfi
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.schoolSchool of Scienceen
dc.contributor.supervisorKaski, Samuel, Prof., Aalto University, Department of Computer Science, Finland, and Prof., University of Manchester, United Kingdom
dc.date.accessioned2025-04-22T09:00:22Z
dc.date.available2025-04-22T09:00:22Z
dc.date.defence2025-04-30
dc.date.issued2025
dc.description.abstractThis thesis advances probabilistic machine learning and deep learning methods for personalized medicine applications. Personalized medicine aims to tailor diagnosis, prevention and treatment choices for diseases to individual patient characteristics, and machine learning supports this by offering powerful tools for analyzing various types of individual-level health and biological data. The core machine learning challenge that this thesis aims to address is how to meet the statistical inference needs of individual-level analyses for personalized medicine applications, while effectively utilizing the power of large datasets that capture complex relationships explaining patient outcomes. Addressing this enables more effective machine learning models for the generative and predictive machine learning applications of interest for personalized medicine, which are explored in the articles of the thesis for large-scale health and biological data sources, including genetic biobanks, population-scale health registers, and longitudinal data from electronic health record (EHR) systems. The first research question asks how to create individual-level synthetic datasets for high-dimensional genetic sequences and complex disease phenotypes. Synthetic data is an important tool for researchers developing and evaluating new computational methods for personalized medicine applications, such as polygenic risk scoring, but is difficult to generate effectively at scale from high-dimensional reference datasets with limited samples. The first contribution of the thesis is a new probabilistic machine learning approach and software tool that implements statistical models of the underlying generative processes and simulation-based inference techniques to create high-fidelity synthetic data for a large number of individuals, phenotypic traits and genetic variants. The second and third research questions concern deep learning methods for modeling longitudinal health data to predict various individual-level health-related outcomes. The thesis introduces two techniques to more effectively utilize the informative statistical relationships in large data sources: a geometric deep learning approach that leverages biological relationships between individuals to improve predictive performance and explainability; and a Bayesian meta-learning approach that improves generalizability by pooling information from related supervised learning tasks based on similarities in the causal relationships underlying the outcomes being predicted. These methods are validated through two case studies: modeling the influence of family history on an individual's disease risk using data from Finland's nationwide health registry system, and early prediction of various stroke outcomes using data from the UK Biobank and FinnGen projects.en
dc.format.extent65 + app. 103
dc.format.mimetypeapplication/pdfen
dc.identifier.isbn978-952-64-2529-0 (electronic)
dc.identifier.isbn978-952-64-2530-6 (printed)
dc.identifier.issn1799-4942 (electronic)
dc.identifier.issn1799-4934 (printed)
dc.identifier.issn1799-4934 (ISSN-L)
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/135048
dc.identifier.urnURN:ISBN:978-952-64-2529-0
dc.language.isoenen
dc.opnSuominen, Hanna, Prof., Australian National University, Australia
dc.publisherAalto Universityen
dc.publisherAalto-yliopistofi
dc.relation.haspart[Publication 1]: Sophie Wharrie, Zhiyu Yang, Vishnu Raj, Remo Monti, Rahul Gupta, Ying Wang, Alicia Martin, Luke J O’Connor, Samuel Kaski, Pekka Marttinen, Pier Francesco Palamara, Christoph Lippert, and Andrea Ganna. HAPNEST: efficient, large-scale generation and evaluation of synthetic datasets for genotypes and phenotypes. Bioinformatics, Volume 39, Issue 9, September 2023. Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202310046172. DOI: 10.1093/bioinformatics/btad535
dc.relation.haspart[Publication 2]: Sophie Wharrie, Zhiyu Yang, Andrea Ganna, and Samuel Kaski. Characterizing personalized effects of family information on disease risk using graph representation learning. In Proceedings of the 8th Machine Learning for Healthcare Conference, New York, United States, PMLR, 219:824-845, August 2023. Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202401312218.
dc.relation.haspart[Publication 3]: Sophie Wharrie, Lisa Eick, Lotta Mäkinen, Andrea Ganna, and Samuel Kaski. Bayesian Meta-Learning for Improving Generalizability of Health Prediction Models With Similar Causal Mechanisms. Submitted to a journal, December 2024. arci preprint arXiv:2310.12595
dc.relation.ispartofseriesAalto University publication series Doctoral Thesesen
dc.relation.ispartofseries84/2025
dc.revRoos, Teemu, Prof., University of Helsinki, Finland
dc.revHartford, Jason, Lecturer (Asst. Prof.), University of Manchester, United Kingdom
dc.subject.keywordprobabilistic machine learningen
dc.subject.keyworddeep learningen
dc.subject.keywordpersonalized medicineen
dc.subject.keywordsynthetic dataen
dc.subject.keywordgenerationen
dc.subject.keywordgeometric deep learningen
dc.subject.keywordBayesian meta-learningen
dc.subject.keywordcausal relationshipsen
dc.subject.keywordelectronic health recordsen
dc.subject.keywordhuman geneticsen
dc.subject.otherComputer scienceen
dc.titleAdvancing towards personalized medicine: probabilistic machine learning and deep learning for health and geneticsen
dc.typeG5 Artikkeliväitöskirjafi
dc.type.dcmitypetexten
dc.type.ontasotDoctoral dissertation (article-based)en
dc.type.ontasotVäitöskirja (artikkeli)fi
local.aalto.acrisexportstatuschecked 2025-05-05_1525
local.aalto.archiveyes
local.aalto.formfolder2025_04_22_klo_07_28

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
isbn9789526425290.pdf
Size:
1.76 MB
Format:
Adobe Portable Document Format