Browsing by Author "Ali, Mehreen"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
- Global proteomics profiling improves drug sensitivity prediction : results from a multi-omics, pan-cancer modeling approach
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2018-04-15) Ali, Mehreen; Khan, Suleiman A.; Wennerberg, Krister; Aittokallio, TeroMotivation: Proteomics profiling is increasingly being used for molecular stratification of cancer patients and cell-line panels. However, systematic assessment of the predictive power of large-scale proteomic technologies across various drug classes and cancer types is currently lacking. To that end, we carried out the first pan-cancer, multi-omics comparative analysis of the relative performance of two proteomic technologies, targeted reverse phase protein array (RPPA) and global mass spectrometry (MS), in terms of their accuracy for predicting the sensitivity of cancer cells to both cytotoxic chemotherapeutics and molecularly targeted anticancer compounds. Results: Our results in two cell-line panels demonstrate how MS profiling improves drug response predictions beyond that of the RPPA or the other omics profiles when used alone. However, frequent missing MS data values complicate its use in predictive modeling and required additional filtering, such as focusing on completely measured or known oncoproteins, to obtain maximal predictive performance. Rather strikingly, the two proteomics profiles provided complementary predictive signal both for the cytotoxic and targeted compounds. Further, information about the cellular-abundance of primary target proteins was found critical for predicting the response of targeted compounds, although the non-target features also contributed significantly to the predictive power. The clinical relevance of the selected protein markers was confirmed in cancer patient data. These results provide novel insights into the relative performance and optimal use of the widely applied proteomic technologies, MS and RPPA, which should prove useful in translational applications, such as defining the best combination of omics technologies and marker panels for understanding and predicting drug sensitivities in cancer patients. - Machine learning and feature selection for drug response prediction in precision oncology applications
A2 Katsausartikkeli tieteellisessä aikakauslehdessä(2019-02-07) Ali, Mehreen; Aittokallio, TeroIn-depth modeling of the complex interplay among multiple omics data measured from cancer cell lines or patient tumors is providing new opportunities toward identification of tailored therapies for individual cancer patients. Supervised machine learning algorithms are increasingly being applied to the omics profiles as they enable integrative analyses among the high-dimensional data sets, as well as personalized predictions of therapy responses using multi-omics panels of response-predictive biomarkers identified through feature selection and cross-validation. However, technical variability and frequent missingness in input “big data” require the application of dedicated data preprocessing pipelines that often lead to some loss of information and compressed view of the biological signal. We describe here the state-of-the-art machine learning methods for anti-cancer drug response modeling and prediction and give our perspective on further opportunities to make better use of high-dimensional multi-omics profiles along with knowledge about cancer pathways targeted by anti-cancer compounds when predicting their phenotypic responses. - Multiple output regression with latent noise
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2016-06-01) Gillberg, Leo; Marttinen, Pekka; Pirinen, Matti; Kangas, Antti J.; Soininen, Pasi; Ali, Mehreen; Havulinna, Aki S.; Järvelin, Marjo Riitta; Ala-Korpela, Mika; Kaski, SamuelIn high-dimensional data, structured noise caused by observed and unobserved factors affecting multiple target variables simultaneously, imposes a serious challenge for modeling, by masking the often weak signal. Therefore, (1) explaining away the structured noise in multiple-output regression is of paramount importance. Additionally, (2) assumptions about the correlation structure of the regression weights are needed. We note that both can be formulated in a natural way in a latent variable model, in which both the interesting signal and the noise are mediated through the same latent factors. Under this assumption, the signal model then borrows strength from the noise model by encouraging similar effects on correlated targets. We introduce a hyperparameter for the latent signal-to-noise ratio which turns out to be important for modelling weak signals, and an ordered infinite-dimensional shrinkage prior that resolves the rotational unidentifiability in reduced-rank regression models. Simulations and prediction experiments with metabolite, gene expression, FMRI measurement, and macroeconomic time series data show that our model equals or exceeds the state-of-the-art performance and, in particular, outperforms the standard approach of assuming independent noise and signal models. - Survival Modeling Using Factor Analysis Data Integration.
Perustieteiden korkeakoulu | Master's thesis(2015-11-05) Ali, MehreenBiology proves that complex diseases are a result of an interplay of genetics and environmental factors. This study aims to combine both by integrating `multi-omics' data with clinical data, thus helping biological and medical researchers in the process of disease diagnosis, patient stratification, disease mechanism analysis and effective treatment decisions. Multi-view biological data from a cohort from National Institute for Health and Welfare (THL), Finland, has been explored using factor models. Factor models reduce high-dimensional data into lower-dimensional factor space. Factor analysis (FA) is the simplest factor model that represents each data feature as a weighted sum of latent factors, separating noise. Bayesian multi-view group-sparse factor analysis (GFA) is another factor model that has been examined in this study. GFA is an extension of FA with sparsity added to the model. GFA is applied on high-dimensional data where features can be naturally divided into different groups (views). Unlike FA, GFA can record component (latent factor) activity for views (groups of related features), this makes GFA a well-suited model for multi-view data sets. Survival models have been utilized to make cardiovascular disease (CVD) risk predictions based on the dependencies between the multiple views as represented by factor models. Cox proportional hazard model is applied to analyze data until a CVD risk event occurs and the output variable is time. This study will provide a stepping stone for exploring GFA, in combination with Cox survival model, for a better latent factor representation of multi-view data sets.