Browsing by Author "Julkunen, Heli"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Biomarker Discovery from Multi-View Health Data Using Variations of Canonical Correlation Analysis(2024-08-19) Piho, Kelly; Julkunen, Heli; Perustieteiden korkeakoulu; Rousu, JuhoThe increasing prevalence of type 2 diabetes (T2D) creates a growing public health concern. People with T2D often develop complications related to diabetes, prompting the search for biomarkers that signal an increased risk of developing complications. This thesis explores the connections between molecular risk factors and health outcomes in type 2 diabetics by employing various methods of canonical correlation analysis (CCA). This involves jointly correlating two data views: the first view representing comprehensive health data (metabolomics, clinical biochemistry markers, blood counts, and baseline characteristics) and the second view representing health outcomes, specifically complications of T2D (nephropathy, myocardial infarction, stroke, neuropathy, and retinopathy). The aim is to uncover and discern both linear and non-linear associations between these two views and potentially identify features that could function as biomarkers indicating an increased risk of developing complications. A comparative analysis of gradient-based kernel CCA (gradKCCA) and sparse CCA based on Hilbert-Schmidt independence criterion (SCCA-HSIC) was conducted utilising data from UK Biobank. For gradKCCA, we considered both linear and polynomial kernels. The findings indicate that both linear gradKCCA and SCCA-HSIC discovered reliable and relevant associations, exhibiting the highest correlation and statistical dependence as measured by Hilbert-Schmidt independence criterion (HSIC). Notably, SCCA-HSIC uncovered weak but relevant relationships that linear gradKCCA did not detect, while polynomial gradKCCA methods exhibited overfitting and failed to identify generalisable associations. All methods highlighted the significance of glycated haemoglobin, an established biomarker linked to diabetes. Furthermore, both linear gradKCCA and SCCA-HSIC identified several established biomarkers associated with specific complications (e.g., creatinine and cystatin C for kidney function), and general diabetes-related biomarkers (e.g., glucose, blood pressure). Additionally, SCCA-HSIC recognised the association between insulin resistance and branched-chain amino acids. This study presents a novel application of gradKCCA and SCCA-HSIC in analysing a large biobank dataset containing multi-view health data. The findings under-score the efficacy of SCCA-HSIC and the importance of ongoing research into the biomarkers to prevent diabetes complications and improve patient outcomes. Future research should address the limitations of the current deflation strategy, which is derived from linear CCA, to enhance non-linear methods. This research has been conducted using the UK Biobank Resource under application number 147811.Item Kernel-based machine learning approaches to drug-protein interaction prediction(2016-12-22) Julkunen, Heli; Cichonska, Anna; Sähkötekniikan korkeakoulu; Turunen, MarkusItem Learning with multiple pairwise kernels for drug bioactivity prediction(2018-07-01) Cichonska, Anna; Pahikkala, Tapio; Szedmak, Sandor; Julkunen, Heli; Airola, Antti; Heinonen, Markus; Aittokallio, Tero; Rousu, Juho; Department of Computer Science; Professorship Rousu Juho; Helsinki Institute for Information Technology (HIIT); Professorship Lähdesmäki Harri; Centre of Excellence in Molecular Systems Immunology and Physiology Research Group, SyMMys; University of Turku; Aalto UniversityMotivation: Many inference problems in bioinformatics, including drug bioactivity prediction, can be formulated as pairwise learning problems, in which one is interested in making predictions for pairs of objects, e.g. drugs and their targets. Kernel-based approaches have emerged as powerful tools for solving problems of that kind, and especially multiple kernel learning (MKL) offers promising benefits as it enables integrating various types of complex biomedical information sources in the form of kernels, along with learning their importance for the prediction task. However, the immense size of pairwise kernel spaces remains a major bottleneck, making the existing MKL algorithms computationally infeasible even for small number of input pairs. Results: We introduce pairwiseMKL, the first method for time- and memory-efficient learning with multiple pairwise kernels. pairwiseMKL first determines the mixture weights of the input pairwise kernels, and then learns the pairwise prediction function. Both steps are performed efficiently without explicit computation of the massive pairwise matrices, therefore making the method applicable to solving large pairwise learning problems. We demonstrate the performance of pairwiseMKL in two related tasks of quantitative drug bioactivity prediction using up to 167 995 bioactivity measurements and 3120 pairwise kernels: (i) prediction of anticancer efficacy of drug compounds across a large panel of cancer cell lines; and (ii) prediction of target profiles of anticancer compounds across their kinome-wide target spaces. We show that pairwiseMKL provides accurate predictions using sparse solutions in terms of selected kernels, and therefore it automatically identifies also data sources relevant for the prediction problem.Item Leveraging multi-way interactions for systematic prediction of pre-clinical drug combination effects(Nature Publishing Group, 2020-12-01) Julkunen, Heli; Cichonska, Anna; Gautam, Prson; Szedmak, Sandor; Douat, Jane; Pahikkala, Tapio; Aittokallio, Tero; Rousu, Juho; Department of Computer Science; Professorship Rousu Juho; Helsinki Institute for Information Technology (HIIT); Computer Science - Computational Life Sciences (CSLife); University of Helsinki; Department of Computer Science; University of Turku; Aalto UniversityWe present comboFM, a machine learning framework for predicting the responses of drug combinations in pre-clinical studies, such as those based on cell lines or patient-derived cells. comboFM models the cell context-specific drug interactions through higher-order tensors, and efficiently learns latent factors of the tensor using powerful factorization machines. The approach enables comboFM to leverage information from previous experiments performed on similar drugs and cells when predicting responses of new combinations in so far untested cells; thereby, it achieves highly accurate predictions despite sparsely populated data tensors. We demonstrate high predictive performance of comboFM in various prediction scenarios using data from cancer cell line pharmacogenomic screens. Subsequent experimental validation of a set of previously untested drug combinations further supports the practical and robust applicability of comboFM. For instance, we confirm a novel synergy between anaplastic lymphoma kinase (ALK) inhibitor crizotinib and proteasome inhibitor bortezomib in lymphoma cells. Overall, our results demonstrate that comboFM provides an effective means for systematic pre-screening of drug combinations to support precision oncology applications.Item Predictive Modeling of Anticancer Efficacy of Drug Combinations Using Factorization Machines(2019-06-17) Julkunen, Heli; Cichonska, Anna; Perustieteiden korkeakoulu; Rousu, JuhoCo-administration of drugs is a widely used strategy in cancer treatment to prevent drug resistance and improve the therapeutic efficacy while reducing the toxicity and side effects of the treatment. Despite their effectiveness, new combination therapies have been slow to emerge, as selecting and testing potential drug combinations against various cancer cell lines remains time- and cost inefficient. During the recent years, machine learning methods have emerged as powerful means to aid the drug development process. However, the underlying dose response matrix structure of drug combination data and the complexity of drug interaction patterns observed across various dose pairs poses challenges to accurate modeling of drug combination effects. In this thesis, we present a novel machine learning framework for predicting the therapeutic efficacy of drug combinations in human cancer cell lines using factorization machines, a recent model class designed for efficient modeling of higher-order feature interactions. We base our work on the observation that the underlying dose-response data can be compiled into a higher-order tensor indexed by drugs, drug concentrations and cell lines. The drug combination responses can then be modeled as an interaction between these different domains. We tested the model using the publicly available NCI-ALMANAC dataset on pairwise drug combinations screened in various concentration pairs across the NCI-60 panel of human cancer cell lines. The proposed method showed high predictive accuracy not only in filling in missing entries in otherwise known dose-response matrices, but also in a more challenging and practical setting of extending the predictions to new drug combinations not observed in the training space. The obtained results demonstrate that the framework provides promising means for systematic pre-screening of drug combinations for their therapeutic potential, thus holding promise to support precision medicine efforts.