Browsing by Author "Gao, Junning"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
- AiProAnnotator: Low-rank Approximation with network side information for high-performance, large-scale human Protein abnormality Annotator
A4 Artikkeli konferenssijulkaisussa(2018) Gao, Junning; Yao, Shuwei; Mamitsuka, Hiroshi; Zhu, ShanfengAnnotating genes/proteins is a vital issue in biology. Particularly we focus on human proteins and medical annotation, which both are important. The most proper data for our annotation is human phenotype ontology (HPO), which are sparse but reliable (well-curated). Existing approaches for this problem are feature-based or network-based. The feature-based approach can incorporate a variety of information, by which this approach is more appropriate for noisy data than reliable data, while the network-based approach is not necessarily useful for sparse data. Low-rank approximation is very powerful for both sparse and reliable data. We thus propose to use matrix factorization to approximate the input annotation matrix (proteins × HPO terms) by factorized low-rank matrices. We further incorporate network information, i.e. protein-protein network (PPN) and network from HPO (NHPO), into the framework of matrix factorization as graph regularization over the two low-rank matrices. That is, the input annotation matrix is factorized into two low-rank factor matrices so that they can be smooth over PPN and NHPO. We call our software of implementing the above method “AiProAnnotator”, which in this paper has been empirically examined using the latest HPO data extensively under various experimental settings, including performance comparison under cross-validation, computation time and case studies, etc. Experimental results showed the high predictive performance and time efficiency of AiProAnnotator clearly. - DrugE-Rank: Improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2016-06-15) Yuan, Qingjun; Gao, Junning; Wu, Dongliang; Zhang, Shihua; Mamitsuka, Hiroshi; Zhu, ShanfengMotivation: Identifying drug-target interactions is an important task in drug discovery. To reduce heavy time and financial cost in experimental way, many computational approaches have been proposed. Although these approaches have used many different principles, their performance is far from satisfactory, especially in predicting drug-target interactions of new candidate drugs or targets. Methods: Approaches based on machine learning for this problem can be divided into two types: feature-based and similarity-based methods. Learning to rank is the most powerful technique in the feature-based methods. Similarity-based methods are well accepted, due to their idea of connecting the chemical and genomic spaces, represented by drug and target similarities, respectively. We propose a new method, DrugE-Rank, to improve the prediction performance by nicely combining the advantages of the two different types of methods. That is, DrugE-Rank uses LTR, for which multiple well-known similarity-based methods can be used as components of ensemble learning. Results: The performance of DrugE-Rank is thoroughly examined by three main experiments using data from DrugBank: (i) cross-validation on FDA (US Food and Drug Administration) approved drugs before March 2014; (ii) independent test on FDA approved drugs after March 2014; and (iii) independent test on FDA experimental drugs. Experimental results show that DrugE-Rank outperforms competing methods significantly, especially achieving more than 30% improvement in Area under Prediction Recall curve for FDA approved new drugs and FDA experimental drugs. - HPOAnnotator: improving large-scale prediction of HPO annotations by low-rank approximation with HPO semantic similarities and multiple PPI networks
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2019-12-23) Gao, Junning; Liu, Lizhi; Yao, Shuwei; Huang, Xiaodi; Mamitsuka, Hiroshi; Zhu, ShanfengBackground: As a standardized vocabulary of phenotypic abnormalities associated with human diseases, the Human Phenotype Ontology (HPO) has been widely used by researchers to annotate phenotypes of genes/proteins. For saving the cost and time spent on experiments, many computational approaches have been proposed. They are able to alleviate the problem to some extent, but their performances are still far from satisfactory. Method: For inferring large-scale protein-phenotype associations, we propose HPOAnnotator that incorporates multiple Protein-Protein Interaction (PPI) information and the hierarchical structure of HPO. Specifically, we use a dual graph to regularize Non-negative Matrix Factorization (NMF) in a way that the information from different sources can be seamlessly integrated. In essence, HPOAnnotator solves the sparsity problem of a protein-phenotype association matrix by using a low-rank approximation. Results: By combining the hierarchical structure of HPO and co-annotations of proteins, our model can well capture the HPO semantic similarities. Moreover, graph Laplacian regularizations are imposed in the latent space so as to utilize multiple PPI networks. The performance of HPOAnnotator has been validated under cross-validation and independent test. Experimental results have shown that HPOAnnotator outperforms the competing methods significantly. Conclusions: Through extensive comparisons with the state-of-the-art methods, we conclude that the proposed HPOAnnotator is able to achieve the superior performance as a result of using a low-rank approximation with a graph regularization. It is promising in that our approach can be considered as a starting point to study more efficient matrix factorization-based algorithms. - A Robust Convex Formulation for Ensemble Clustering
A4 Artikkeli konferenssijulkaisussa(2016-07) Gao, Junning; Yamada, Makoto; Kaski, Samuel; Mamitsuka, Hiroshi; Zhu, ShanfengWe formulate ensemble clustering as a regularization problem over nuclear norm and cluster-wise group norm, and present an efficient optimization algorithm, which we call Robust Convex Ensemble Clustering (RCEC). A key feature of RCEC allows to remove anomalous cluster assignments obtained from component clustering methods by using the group-norm regularization. Moreover, the proposed method is convex and can find the globally optimal solution. We first showed that using synthetic data experiments, RCEC could learn stable cluster assignments from the input matrix including anomalous clusters. We then showed that RCEC outperformed state-of-the-art ensemble clustering methods by using real-world data sets.