Machine learning methods for improving drug response prediction in cancer

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Science | Doctoral thesis (article-based) | Defence date: 2017-07-27
Degree programme
62 + app. 76
Aalto University publication series DOCTORAL DISSERTATIONS, 127/2017
Personalizing medicine, by choosing therapies that maximize effectiveness and minimize side effects for individual patients, is one of the prime challenges in cancer treatment. At the core of personalized medicine is a machine learning problem: Given a set of patients whose response to some drugs has been observed, predict the response of a new patient or to a new drug. Computationally predicted responses can then be used to generate hypotheses for selecting therapies tailored to individual patients. However, the prediction task is exceedingly challenging, raising the need for the development of new machine learning methods.  This thesis undertakes a unique multi-disciplinary approach to predict drug responses by utilizing multiple data sources in cancer, while simultaneously advancing the computational methods to improve accuracy. Specifically, the thesis presents a new Bayesian multi-view multi-task method that outperformed existing computational models in an international crowdsourcing challenge to predict drug responses. The method is further extended to solve the more challenging task of predicting drug responses in multiple cancer types. Notably, the thesis extends the kernelized Bayesian matrix factorization method with component-wise multiple kernel learning for effectively inferring associations between a large number of biologically motivated data sources and the latent factors. The results demonstrate that the new formulation of the method, supplemented with prior biological knowledge, is helpful for discovering interpretable associations as well as for predicting the drug responses of new cancer cells.  The original contribution of this thesis is two-fold: First, the thesis proposes novel multi-view and multi-task methods to predict drug responses in cancer cells with increased accuracy. Second, new ways of incorporating prior biological knowledge are explored to further improve drug response predictions. Open source implementations of the new methods have been released to facilitate further research.
Supervising professor
Kaski, Samuel, Prof., Aalto University, Department of Computer Science, Finland
data integration, multi-view multi-task machine learning, personalized medicine
Other note
  • [Publication 1]: James C. Costello, Laura M. Heiser, Elisabeth Georgii, Mehmet Gönen, Michael P. Menden, Nicholas J. Wang, Mukesh Bansal, Muhammad Ammad-ud-din , Petteri Hintsanen, Suleiman A. Khan, John-Patrick Mpindi, Olli Kallioniemi, Antti Honkela, Tero Aittokallio, Krister Wennerberg, NCI DREAM Community, James J. Collins, Dan Gallahan, Dinah Singe, Julio Saez-Rodriguez, Samuel Kaski, Joe W. Gray and Gustavo Stolovitzky. A Community Effort to Assess and Improve Drug Sensitivity Prediction Algorithms. Nature Biotechnology, 32, 12, 1202-1212, 2014.
    DOI: 10.1038/nbt.2877 View at publisher
  • [Publication 2]: Muhammad Ammad-ud-din , Elisabeth Georgii, Mehmet Gönen, Tuomo Laitinen, Olli Kallioniemi, Krister Wennerberg, Antti Poso, and Samuel Kaski. Integrative and Personalized QSAR Analysis in Cancer by Kernelized Bayesian Matrix Factorization. Journal of Chemical Information and Modeling, 54, 8, 2347-2359, 2014.
    DOI: 10.1021/ci500152b View at publisher
  • [Publication 3]: Muhammad Ammad-ud-din , Suleiman A.Khan, Disha Malani, Astrid Murumägi, Olli Kallioniemi, Tero Aittokallio and Samuel Kaski. Drug response prediction by inferring pathway-response associations with Kernelized Bayesian Matrix Factorization. Bioinformatics, 32, 17, i455-i463, 2016.
    DOI: 10.1093/bioinformatics/btw433 View at publisher
  • [Publication 4]: Solveig Sieberts, Fan Zhu, Javier García-García, Eli Stahl, Abhishek Pratap, Gaurav Pandey, Dimitrios Pappas, Daniel Aguilar, Bernat Anton, Jaume Bonet, Ridvan Eksi, Oriol Fornés, Emre Guney, Hongdong Li, Manuel Marín, Bharat Panwar, Joan Planas-Iglesias, Daniel Poglayen, Jing Cui, Andre Falcao, Christine Suver, Bruce Hoff, Venkat Balagurusamy, Donna Dillenberger, Elias Chaibub Neto, Thea Norman, Tero Aittokallio, Muhammad Ammad-ud-din, Chloe-Agathe Azencott, Víctor Bellón, Valentina Boeva, Kerstin Bunte, Himanshu Chheda, Lu Cheng, Jukka Corander, Michel Dumontier, Anna Goldenberg, Peddinti Gopalacharyulu, Mohsen Hajiloo, Daniel Hidru, Alok Jaiswal, Samuel Kaski, Beyrem Khalfaoui, Suleiman Khan, Eric Kramer, Pekka Marttinen, Aziz Mezlini, Bhuvan Molparia,Matti Pirinen, Janna Saarela, Matthias Samwald, Veronique Stoven, Hao Tang, Jing Tang, Ali Torkamani, Jean-Philippe Vert, Bo Wang, Tao Wang, Krister Wennerberg, Nathan Wineinger, Guanghua Xiao, Yang Xie, Rae Yeung, Xiaowei Zhan, Cheng Zhao, Jeff Greenberg, Joel Kremer, Kaleb Michaud, Anne Barton, Marieke Coenen, Xavier Mariette, Corinne Miceli, Nancy Shadick, Michael Weinblatt, Niek de Vries, Paul Tak, Danielle Gerlag, Tom W. J. Huizinga, Fina Kurreeman, Cornelia Allaart, Stanley Bridges, Lindsey Criswell, Larry Moreland, Lars Klareskog, Saedis Saevarsdottir, Leonid Padyukov, Peter Gregersen, Stephen Friend, Robert Plenge, Gustavo Stolovitzky, Baldomero Oliva, Yuan-fang Guan, and Lara Mangravite. Crowdsourced assessment of common genetic contribution to predicting anti-TNF treatment response in rheumatoid arthritis. Nature Communication, 7, EP:12460, 2016.
    DOI: 10.1038/ncomms12460 View at publisher
  • [Publication 5]: Eemeli Leppäaho and Muhammad Ammad-ud-din and Samuel Kaski. GFA: Exploratory Analysis of Multiple Data Sources with Group Factor Analysis. Journal of Machine Learning Research, 18, 39, 1-5, 2017
  • [Publication 6]: Marta Soare, Muhammad Ammad-ud-din and Samuel Kaski. Regression with n→1 by Expert Knowledge Elicitation. In 15th IEEE International Conference on Machine Learning and Applications (ICMLA), Anaheim, USA, 734-739, Dec 2016.
    DOI: 10.1109/ICMLA.2016.0131 View at publisher