Bayesian multi-view models for data-driven drug response analysis

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Science | Doctoral thesis (article-based) | Defence date: 2015-09-07
Checking the digitized thesis and permission for publishing
Instructions for the author
Degree programme
79 + app. 99
Aalto University publication series DOCTORAL DISSERTATIONS, 105/2015
A central challenge faced by biological and medical research is to understand the impact of chemical entities on living cells. Identifying the relationships between the chemical structures and their cellular responses is valuable for improving drug design and targeted therapies. The chemical structures and their detailed molecular responses need to be combined through a systematic analysis to learn the complex dependencies, which can then assist in improving understanding of the molecular mechanisms of drugs as well as predictions on the effects of unknown molecules. Moreover, with emerging drug-response data sets being profiled over several disease types and phenotypic details, it is pertinent to develop advanced computational methods that can be used to study multiple sets of data together. In this thesis, a novel multi-disciplinary challenge is undertaken for computationally analyzing interactions between multiple biological responses and chemical properties of drugs, while simultaneously advancing the computational methods to better learn these interactions. Specifically, multi-view dependency modeling of paired data sets is formulated as a means of systematically studying the drug-response relationships. First, the systematic analysis of drug structures and their genome-wide responses is presented as a multi-set dependency modeling problem and established methods are adopted to test the novel hypothesis. Several novel extensions of the drug-response analysis are then presented that explore responses measured over multiple disease types and multiple levels of phenotypic detail, uncovering novel biological insights of potential impact. These analyses are made possible by novel advancements in multi-view methods. Specifically, the first Bayesian tensor canonical correlation analysis and its extensions are introduced to capture the underlying multi-way structure and applied in analyzing novel toxicogenomic interactions. The results illustrate that modeling the precise multi-view and multi-way formulation of the data is valuable for discovering interpretable latent components as well as for the prediction of unseen responses of drugs. Therefore, the original contribution to knowledge in this dissertation is two-fold: first, the data-driven identification of relationships between structural properties of drugs and their genome-wide responses in cells and, second, novel advancements of multi-view methods that find dependencies between paired data sets. Open source implementations of the new methods have been released to facilitate further research.
Supervising professor
Kaski, Samuel, Prof., Aalto University, Department of Computer Science, Finland
Bayesian modeling, machine learning, multi-view learning, computational biology, bioinformatics, toxicogenomics, latent variable models, Bayesian tensor CCA
Other note
  • [Publication 1]: Suleiman A Khan, Ali Faisal, John P Mpindi, Juuso A Parkkinen, Tuomo Kalliokoski, Antti Poso, Olli P Kallioniemi, Krister Wennerberg and Samuel Kaski. Comprehensive data-driven analysis of the impact of chemoinformatic structure on the genome-wide biological response profiles of cancer cells to 1159 drugs. BMC Bioinformatics, 13:112, 2012.
    DOI: 10.1186/1471-2105-13-112 View at publisher
  • [Publication 2]: Seppo Virtanen, Arto Klami, Suleiman A Khan and Samuel Kaski. Bayesian Group Factor Analysis. In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics AISTATS, JMLR W&CP, 22:1269–1277, 2012.
  • [Publication 3]: Suleiman A Khan, Seppo Virtanen, Olli P Kallioniemi, Krister Wennerberg, Antti Poso and Samuel Kaski. Identification of structural features in chemicals associated with cancer drug response: A systematic data-driven analysis. In Proceedings of the Thirteenth European Conference on Computational Biology ECCB, Bioinformatics, 30:i497–i504, 2014.
    DOI: 10.1093/bioinformatics/btu456 View at publisher
  • [Publication 4]: Mehmet Gonen, Suleiman A Khan and Samuel Kaski. Kernelized Bayesian Matrix Factorization. In Proceedings of the Twenty-Ninth International Conference on Machine Learning ICML, JMLR W&CP, 28:864–872, 2012.
  • [Publication 5]: Suleiman A Khan and Samuel Kaski. Bayesian Multi-View Tensor Factorization. In Proceedings of the Seventh European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ECML PKDD, editors T. Calders et al., Springer-Verlag Berlin Heidelberg, 8724:656-671, 2014.
  • [Publication 6]: Suleiman A Khan, Eemeli Leppaaho and Samuel Kaski. Multi-Tensor Factorization. Submitted to a journal, 23 pages, 2015.