Predicting Protein Producibility: Binary classification of recombinant proteins produced in filamentous fungi

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Date
2016-02-15
Department
Major/Subject
Machine Learning and Data Mining
Mcode
SCI3015
Degree programme
Master’s Programme in Machine Learning and Data Mining (Macadamia)
Language
en
Pages
78
Series
Abstract
Recombinant protein synthesis aims to produce specific protein products of interest in living cells. However, protein production is subject to failure, and thus the successful development of a computational tool to predict protein sequence success prior to laboratory experimentation would save time and resources. We demonstrate the ability of an SVM trained on protein amino acid composition to predict successful protein production in a dataset of sequences tested in the host species Trichoderma reesei. We found that predictive models generalize well between two species of filamentous fungi, and furthermore that 50 training sequences are sufficient to train a model that yields an AUC of over .7. We introduced novel predictive features using protein domains detected with the InterProScan tool, which were modestly successful in the predictive task but whose addition did not improve over the use of amino acid composition alone. Experiments applying semi-supervised SVM formulations to the predictive task did not yield significant improvement, most likely because the spatial distribution of data points under the chosen numeric representations did not conform to the assumptions of the semi-supervised models. We explored the species of origin and enzyme function of sequences from the UniProt SwissProt database predicted to be successful by the trained SVM models, and showed that models trained with an RBF kernel were the most conservative in terms of the number of predicted successes.
Description
Supervisor
Rousu, Juho
Thesis advisor
Arvas, Mikko
Keywords
binary classification, SVM, protein, filamentous fungi, semi-supervised
Citation