GORetriever : Reranking protein-description-based GO candidates by literature-driven deep information retrieval for protein function annotation

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorYan, Huiying
dc.contributor.authorWang, Shaojun
dc.contributor.authorLiu, Hancheng
dc.contributor.authorMamitsuka, Hiroshi
dc.contributor.authorZhu, Shanfeng
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.groupauthorProbabilistic Machine Learningen
dc.contributor.groupauthorHelsinki Institute for Information Technology (HIIT)en
dc.contributor.groupauthorProfessorship Kaski Samuelen
dc.contributor.organizationFudan University
dc.date.accessioned2024-09-19T15:34:14Z
dc.date.available2024-09-19T15:34:14Z
dc.date.issued2024-09-01
dc.descriptionPublisher Copyright: © 2024 The Author(s).
dc.description.abstractSummary: The vast majority of proteins still lack experimentally validated functional annotations, which highlights the importance of developing high-performance automated protein function prediction/annotation (AFP) methods. While existing approaches focus on protein sequences, networks, and structural data, textual information related to proteins has been overlooked. However, roughly 82% of SwissProt proteins already possess literature information that experts have annotated. To efficiently and effectively use literature information, we present GORetriever, a two-stage deep information retrieval-based method for AFP. Given a target protein, in the first stage, candidate Gene Ontology (GO) terms are retrieved by using annotated proteins with similar descriptions. In the second stage, the GO terms are reranked based on semantic matching between the GO definitions and textual information (literature and protein description) of the target protein. Extensive experiments over benchmark datasets demonstrate the remarkable effectiveness of GORetriever in enhancing the AFP performance. Note that GORetriever is the key component of GOCurator, which has achieved first place in the latest critical assessment of protein function annotation (CAFA5: over 1600 teams participated), held in 2023-2024.en
dc.description.versionPeer revieweden
dc.format.mimetypeapplication/pdf
dc.identifier.citationYan, H, Wang, S, Liu, H, Mamitsuka, H & Zhu, S 2024, 'GORetriever : Reranking protein-description-based GO candidates by literature-driven deep information retrieval for protein function annotation', Bioinformatics, vol. 40, pp. ii53-ii61. https://doi.org/10.1093/bioinformatics/btae401en
dc.identifier.doi10.1093/bioinformatics/btae401
dc.identifier.issn1367-4803
dc.identifier.issn1367-4811
dc.identifier.otherPURE UUID: 8d4403e0-6ca2-4ec1-938e-26fec22a05cb
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/8d4403e0-6ca2-4ec1-938e-26fec22a05cb
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85203191862&partnerID=8YFLogxK
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/158717034/GORetriever_-_reranking_protein-description-based_GO_candidates_by_literature-driven_deep_information_retrieval_for_protein_function_annotation_.pdf
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/130890
dc.identifier.urnURN:NBN:fi:aalto-202409196437
dc.language.isoenen
dc.publisherOxford University Press
dc.relation.ispartofseriesBioinformaticsen
dc.relation.ispartofseriesVolume 40, pp. ii53-ii61en
dc.rightsopenAccessen
dc.titleGORetriever : Reranking protein-description-based GO candidates by literature-driven deep information retrieval for protein function annotationen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionpublishedVersion

Files