Protein function prediction through multi-view multi-label latent tensor reconstruction

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorArmah-Sekum, Robert Eboen_US
dc.contributor.authorSzedmak, Sandoren_US
dc.contributor.authorRousu, Juhoen_US
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.groupauthorProfessorship Rousu Juhoen
dc.contributor.groupauthorComputer Science Professorsen
dc.contributor.groupauthorComputer Science - Computational Life Sciences (CSLife)en
dc.contributor.groupauthorComputer Science - Artificial Intelligence and Machine Learning (AIML)en
dc.contributor.groupauthorComputer Science - Large-scale Computing and Data Analysis (LSCA)en
dc.date.accessioned2024-05-15T07:54:39Z
dc.date.available2024-05-15T07:54:39Z
dc.date.issued2024-05-02en_US
dc.descriptionPublisher Copyright: © The Author(s) 2024.
dc.description.abstractBackground: In last two decades, the use of high-throughput sequencing technologies has accelerated the pace of discovery of proteins. However, due to the time and resource limitations of rigorous experimental functional characterization, the functions of a vast majority of them remain unknown. As a result, computational methods offering accurate, fast and large-scale assignment of functions to new and previously unannotated proteins are sought after. Leveraging the underlying associations between the multiplicity of features that describe proteins could reveal functional insights into the diverse roles of proteins and improve performance on the automatic function prediction task. Results: We present GO-LTR, a multi-view multi-label prediction model that relies on a high-order tensor approximation of model weights combined with non-linear activation functions. The model is capable of learning high-order relationships between multiple input views representing the proteins and predicting high-dimensional multi-label output consisting of protein functional categories. We demonstrate the competitiveness of our method on various performance measures. Experiments show that GO-LTR learns polynomial combinations between different protein features, resulting in improved performance. Additional investigations establish GO-LTR’s practical potential in assigning functions to proteins under diverse challenging scenarios: very low sequence similarity to previously observed sequences, rarely observed and highly specific terms in the gene ontology. Implementation: The code and data used for training GO-LTR is available at https://github.com/aalto-ics-kepaco/GO-LTR-prediction.en
dc.description.versionPeer revieweden
dc.format.extent21
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationArmah-Sekum, R E, Szedmak, S & Rousu, J 2024, ' Protein function prediction through multi-view multi-label latent tensor reconstruction ', BMC Bioinformatics, vol. 25, no. 1, 174, pp. 1-21 . https://doi.org/10.1186/s12859-024-05789-4en
dc.identifier.doi10.1186/s12859-024-05789-4en_US
dc.identifier.issn1471-2105
dc.identifier.otherPURE UUID: a11915fa-3df2-4e23-b345-940c351528c3en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/a11915fa-3df2-4e23-b345-940c351528c3en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85191978499&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/145889683/Protein_function_prediction_through_multi-view_multi-label_latent_tensor_reconstruction.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/127753
dc.identifier.urnURN:NBN:fi:aalto-202405153367
dc.language.isoenen
dc.publisherBioMed Central
dc.relation.ispartofseriesBMC Bioinformatics
dc.relation.ispartofseriesVolume 25, issue 1, pp. 1-21
dc.rightsopenAccessen
dc.subject.keywordCAFAen_US
dc.subject.keywordGene ontologyen_US
dc.subject.keywordMachine learningen_US
dc.subject.keywordProtein functionen_US
dc.titleProtein function prediction through multi-view multi-label latent tensor reconstructionen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionpublishedVersion
Files