Machine Learning for Enzyme Promiscuity

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Sähkötekniikan korkeakoulu | Master's thesis

Date

2018-06-18

Department

Major/Subject

Machine Learning and Data Mining

Mcode

SCI3044

Degree programme

CCIS - Master’s Programme in Computer, Communication and Information Sciences (TS2013)

Language

en

Pages

6+71

Series

Abstract

With the discovery of an increasing number of catalytically promiscuous enzymes, which are capable of catalyzing multiple reactions, the traditional view of enzymes as highly specific proteins has been brought into question. The significant implications of protein promiscuity for the theory of enzyme evolution suggest that this inherent feature can be utilized as the seed for engineering new functions in biotechnology and synthetic biology as well as in drug design. Therefore, understanding protein promiscuity is becoming even more important as it provides new insights into the evolutionary process that has led to such vast functional diversity. While there have been numerous efforts devoted to recognizing the determinants of promiscuity, till date, this pertinent question regarding the distinctions between specialized enzymes and promiscuous enzymes has remained unanswered. As an in silico approach, in this thesis, we attempt to find a predictive model which can accurately classify unseen proteins into catalytically promiscuous and non-promiscuous. To this end, we exploit different representations and properties of proteins, and adopt different computational approaches accordingly. The role of proteins sequences as indicators of promiscuity is investigated by means of the BLAST algorithm as well as string kernels. Additionally, to validate the interplay between proteins' three-dimensional structures and their promiscuous behaviors, we employ a novel method which is modeling the topological details of proteins as graphs. Graph kernel functions are then applied to measure the structural similarities between the 3D structures of proteins. The classification is performed using SVM as a kernel-based method. The results indicate that proteins' sequences have limited bearings on promiscuity. Conversely, proteins' 3D structures can reliably predict whether a protein has promiscuous activities with an accuracy of 96%. Our best results are achieved using the Weisfeiler-Lehman subtree graph kernel and the secondary structure information of proteins.

Description

Supervisor

Rousu, Juho

Thesis advisor

Heinonen, Markus
Szedmak, Sandor

Keywords

enzyme promiscuity, machine learning, graph kernels, classification, SVM, kernel methods

Other note

Citation