Application of variations of non-linear CCA for feature selection in drug sensitivity prediction

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Date
2019-06-17
Department
Major/Subject
Bioinformatics
Mcode
SCI3058
Degree programme
Master’s Programme in Life Science Technologies
Language
en
Pages
65+6
Series
Abstract
Cancer arises due to the genetic alteration in patient DNA. Many studies indicate the fact that these alterations vary among patients and can affect the therapeutic effect of cancer treatment dramatically. Therefore, extensive studies focus on understanding these alterations and their effects. Pre-clinical models play an important role in cancer drug discovery and cancer cell lines are one of the main ingredients of these pre-clinical studies which can capture many different aspects of multi-omics properties of cancer cells. However, the assessment of cancer cell line responses to different drugs is faulty and laborious. Therefore, in-silico models, which perform accurate prediction of drug sensitivity values, enhance cancer drug discovery. In the past decade, many computational methods achieved high performances by studying similarity between cancer cell lines and drug compounds and used them to obtain an accurate predictive model for unknown instances. In this thesis, we study the effect of non-linear feature selection through two variations of canonical correlation analysis, KCCA, and HSIC-SCCA, on the prediction of drug sensitivity. To estimate the performance of these features we use pairwise kernel ridge regression to predict the drug sensitivity, measured as IC50 values. The data set under study is a subset of Genomics of Drug Sensitivity in Cancer comprise of 124 cell lines and 124 drug compounds. The high diversity between cell lines and drug compound samples and the high dimension of data matrices reduce the accuracy of the model obtained by pairwise kernel ridge regression. This accuracy reduced by employing HSIC-SCCA method as a dimension reduction step since the HSIC-SCCA method increased the differences among the samples by employing different projection vectors for samples in different folds of cross-validation. Therefore, the obtained variables are rotated to provide more homogeneous samples. This step slightly improved the accuracy of the model.
Description
Supervisor
Rousu, Juho
Thesis advisor
Uurtio, Viivi
Keywords
drug sensitivity, CCA, KCCA, HSIC-SCCA, pairwise kernel ridge regression, cancer cell lines
Other note
Citation