Machine Learning for Small Molecule Identification

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Shen, Huibin
dc.date.accessioned 2017-03-15T10:00:31Z
dc.date.available 2017-03-15T10:00:31Z
dc.date.issued 2017
dc.identifier.isbn 978-952-60-7292-0 (electronic)
dc.identifier.isbn 978-952-60-7293-7 (printed)
dc.identifier.issn 1799-4942 (electronic)
dc.identifier.issn 1799-4934 (printed)
dc.identifier.issn 1799-4934 (ISSN-L)
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/24783
dc.description.abstract Metabolites are small molecules involved in biological process of organisms. For example, ethylene serves as plants hormone to stimulate or regulate the opening of flowers, ripening of fruit and shedding of leaves. Metabolite identification is to figure out the molecular structure of the metabo-lite contained in some biological sample, which is considered as a major bottleneck for metabolo-mics. The backbone analytical technology for metabolite identification is tandem mass spectrometry. It consists two rounds of mass spectrometry: In the first round all the metabolites in a sample are measured and one particular metabolite being interested is selected and fragmented by a process of dissociation. In the second round, the fragments as well as their abundance are measured. The resulting tandem mass spectra contain the information on the structure and composition of the molecules. This thesis aims to solve the problem of identifying the molecular structures that produce the observed tandem mass spectra from some biological sample. The traditional methods are mostly based on matching the observed tandem mass spectra to the reference spectra in some database. However, these methods could fail if there are no reference spectra for the molecules in the underlying sample, which is not uncommon especially considering only 220,000 spectra representing 20,000 molecules are measured and annotated according to a recent study while the number of molecules recorded in a compound database PubChem is more than 60 million. To alleviate this problem, many recent works has been focusing on the approach so called in silico fragmentation where the fragmentations are first simulated in computer for the molecules in some molecular database. Then the simulated fragments are compared to the measured tandem mass spectra. The main contribution of this thesis is to open a novel direction to bridge the gap between the limited spectral database and the vast molecular database with the help of molecular fingerprints. Molecular fingerprints are a binary representation to encode the structures or properties of a molecule. Kernel based machine learning methods are used to predict the molecular fingerprints from tandem mass spectra. Then the predicted fingerprints are used to match the fingerprints of mole-cules in some molecular database to derive an identification. Multiple kernel learning are also proposed to combine different views of tandem mass spectra. Finally, a one-step approach based on input output kernel regression is also applied to solve this problem, which becomes the new state of the art as demonstrated in several benchmarks including the recent Critical Assessment of Small Molecule Identification (CASMI) 2016 challenge. en
dc.format.extent 61 + app. 99
dc.format.mimetype application/pdf en
dc.language.iso en en
dc.publisher Aalto University en
dc.publisher Aalto-yliopisto fi
dc.relation.ispartofseries Aalto University publication series DOCTORAL DISSERTATIONS en
dc.relation.ispartofseries 25/2017
dc.relation.haspart [Publication 1]: Markus Heinonen, Huibin Shen, Nicola Zamboni, Juho Rousu. Metabolite identification and molecular fingerprint prediction through machine learning. Bioinformatics, 28, 18, 2333-2341, Sep. 2012. DOI: 10.1093/bioinformatics/bts437
dc.relation.haspart [Publication 2]: Huibin Shen, Nicola Zamboni, Markus Heinonen, Juho Rousu. Metabolite identification through machine learning–tackling CASMI challenge using FingerID. Metabolites, 3, 2, 484-505, Jun. 2013. DOI: 10.3390/metabo3020484
dc.relation.haspart [Publication 3]: Huibin Shen, Kai Dührkop, Sebastian Böcker, Juho Rousu. Metabolite identification through multiple kernel learning on fragmentation trees. Bioinformatics, 30, 12, i157-i164, Jun. 2014. DOI: 10.1093/bioinformatics/btu275
dc.relation.haspart [Publication 4]: Kai Dührkop, Huibin Shen, Marvin Meusel, Juho Rousu, Sebastian Böcker. Searching molecular structure databases with tandem mass spectra using CSI: FingerID. Proceedings of the National Academy of Sciences, 112, 41, 12580-12585, Oct. 2015. DOI: 10.1073/pnas.1509788112
dc.relation.haspart [Publication 5]: Céline Brouard, Huibin Shen, Kai Dührkop, Florence d’Alché-Buc, Sebastian Böcker, Juho Rousu. Fast metabolite identification with Input Output Kernel Regression. Bioinformatics, 32, 12, i28-i36, Jun. 2016. DOI: 10.1093/bioinformatics/btw246
dc.relation.haspart [Publication 6]: Huibin Shen, Sandor Szedmak, Céline Brouard and Juho Rousu. Soft Kernel Target Alignment for Two-stage Multiple Kernel Learning. In 19th International Conference on Discovery Science, Bari, Italy, 427-441, Oct. 2016. DOI: 10.1007/978-3-319-46307-0_27
dc.subject.other Computer science en
dc.subject.other Biotechnology en
dc.title Machine Learning for Small Molecule Identification en
dc.type G5 Artikkeliväitöskirja fi
dc.contributor.school Perustieteiden korkeakoulu fi
dc.contributor.school School of Science en
dc.contributor.department Tietotekniikan laitos fi
dc.contributor.department Department of Computer Science en
dc.subject.keyword machine learning en
dc.subject.keyword metabolite identification en
dc.subject.keyword kernels en
dc.subject.keyword multiple kernel learning en
dc.subject.keyword structured prediction en
dc.subject.keyword tandem mass spectrometry en
dc.identifier.urn URN:ISBN:978-952-60-7292-0
dc.type.dcmitype text en
dc.type.ontasot Doctoral dissertation (article-based) en
dc.type.ontasot Väitöskirja (artikkeli) fi
dc.contributor.supervisor Rousu, Juho, Prof., Aalto University, Department of Computer Science, Finland
dc.opn Laviolette, François, Prof., Université Laval, Canada
dc.rev Shiga, Motoki, Prof., Gifu University, Japan
dc.rev Rogers, Simon, Prof., University of Glasgow, UK
dc.date.defence 2017-03-30


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse