Application of variations of non-linear CCA for feature selection in drug sensitivity prediction

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2019-06-17

Department

Major/Subject

Bioinformatics

Mcode

SCI3058

Degree programme

Master’s Programme in Life Science Technologies

Language

en

Pages

65+6

Series

Abstract

Cancer arises due to the genetic alteration in patient DNA. Many studies indicate the fact that these alterations vary among patients and can affect the therapeutic effect of cancer treatment dramatically. Therefore, extensive studies focus on understanding these alterations and their effects. Pre-clinical models play an important role in cancer drug discovery and cancer cell lines are one of the main ingredients of these pre-clinical studies which can capture many different aspects of multi-omics properties of cancer cells. However, the assessment of cancer cell line responses to different drugs is faulty and laborious. Therefore, in-silico models, which perform accurate prediction of drug sensitivity values, enhance cancer drug discovery. In the past decade, many computational methods achieved high performances by studying similarity between cancer cell lines and drug compounds and used them to obtain an accurate predictive model for unknown instances. In this thesis, we study the effect of non-linear feature selection through two variations of canonical correlation analysis, KCCA, and HSIC-SCCA, on the prediction of drug sensitivity. To estimate the performance of these features we use pairwise kernel ridge regression to predict the drug sensitivity, measured as IC50 values. The data set under study is a subset of Genomics of Drug Sensitivity in Cancer comprise of 124 cell lines and 124 drug compounds. The high diversity between cell lines and drug compound samples and the high dimension of data matrices reduce the accuracy of the model obtained by pairwise kernel ridge regression. This accuracy reduced by employing HSIC-SCCA method as a dimension reduction step since the HSIC-SCCA method increased the differences among the samples by employing different projection vectors for samples in different folds of cross-validation. Therefore, the obtained variables are rotated to provide more homogeneous samples. This step slightly improved the accuracy of the model.

Description

Supervisor

Rousu, Juho

Thesis advisor

Uurtio, Viivi

Keywords

drug sensitivity, CCA, KCCA, HSIC-SCCA, pairwise kernel ridge regression, cancer cell lines

Other note

Citation