HPOLabeler: improving prediction of human protein-phenotype associations by learning to rank

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorLiu, Lizhien_US
dc.contributor.authorHuang, Xiaodien_US
dc.contributor.authorMamitsuka, Hiroshien_US
dc.contributor.authorZhu, Shanfengen_US
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.groupauthorProbabilistic Machine Learningen
dc.contributor.groupauthorHelsinki Institute for Information Technology (HIIT)en
dc.contributor.groupauthorProfessorship Kaski Samuelen
dc.contributor.organizationFudan Universityen_US
dc.contributor.organizationCharles Sturt Universityen_US
dc.date.accessioned2020-08-21T08:31:17Z
dc.date.available2020-08-21T08:31:17Z
dc.date.embargoinfo:eu-repo/date/embargoEnd/2021-08-15en_US
dc.date.issued2020-07-15en_US
dc.description.abstractMOTIVATION: Annotating human proteins by abnormal phenotypes has become an important topic. Human Phenotype Ontology (HPO) is a standardized vocabulary of phenotypic abnormalities encountered in human diseases. As of November 2019, only <4000 proteins have been annotated with HPO. Thus, a computational approach for accurately predicting protein-HPO associations would be important, whereas no methods have outperformed a simple Naive approach in the second Critical Assessment of Functional Annotation, 2013-2014 (CAFA2). RESULTS: We present HPOLabeler, which is able to use a wide variety of evidence, such as protein-protein interaction (PPI) networks, Gene Ontology, InterPro, trigram frequency and HPO term frequency, in the framework of learning to rank (LTR). LTR has been proved to be powerful for solving large-scale, multi-label ranking problems in bioinformatics. Given an input protein, LTR outputs the ranked list of HPO terms from a series of input scores given to the candidate HPO terms by component learning models (logistic regression, nearest neighbor and a Naive method), which are trained from given multiple evidence. We empirically evaluate HPOLabeler extensively through mainly two experiments of cross validation and temporal validation, for which HPOLabeler significantly outperformed all component models and competing methods including the current state-of-the-art method. We further found that (i) PPI is most informative for prediction among diverse data sources and (ii) low prediction performance of temporal validation might be caused by incomplete annotation of new proteins. AVAILABILITY AND IMPLEMENTATION: http://issubmission.sjtu.edu.cn/hpolabeler/. CONTACT: zhusf@fudan.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.en
dc.description.versionPeer revieweden
dc.format.extent9
dc.format.extent4180-4188
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationLiu, L, Huang, X, Mamitsuka, H & Zhu, S 2020, ' HPOLabeler : improving prediction of human protein-phenotype associations by learning to rank ', Bioinformatics (Oxford, England), vol. 36, no. 14, pp. 4180-4188 . https://doi.org/10.1093/bioinformatics/btaa284en
dc.identifier.doi10.1093/bioinformatics/btaa284en_US
dc.identifier.issn1367-4803
dc.identifier.issn1460-2059
dc.identifier.otherPURE UUID: bbb67107-1863-426c-825c-d501bcbddd11en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/bbb67107-1863-426c-825c-d501bcbddd11en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85088879401&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/44930160/SCI_Liu_HPOLabeler_1_3_.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/45841
dc.identifier.urnURN:NBN:fi:aalto-202008214836
dc.language.isoenen
dc.publisherOXFORD UNIV PRESS INC
dc.relation.ispartofseriesBioinformatics (Oxford, England)en
dc.relation.ispartofseriesVolume 36, issue 14en
dc.rightsopenAccessen
dc.titleHPOLabeler: improving prediction of human protein-phenotype associations by learning to ranken
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
Files