Browsing by Author "Brouard, Celine"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Fast metabolite identification with Input Output Kernel Regression(OXFORD UNIV PRESS INC, 2016-06-15) Brouard, Celine; Shen, Huibin; Dührkop, Kai; d'Alché-Buc, Florence; Böcker, Sebastian; Rousu, Juho; Department of Computer Science; Professorship Rousu Juho; Helsinki Institute for Information Technology (HIIT); Friedrich Schiller University Jena; Telecom ParisTechMotivation: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. Results: We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods.Item Improved Small Molecule Identification through Learning Combinations of Kernel Regression Models(Multidisciplinary Digital Publishing Institute (MDPI), 2019-08) Brouard, Celine; Basse, Antoine; d'Alche-Buc, Florence; Rousu, Juho; Department of Computer Science; Professorship Rousu Juho; Helsinki Institute for Information Technology (HIIT); Institut Polytechnique de Paris; Institut National de la Recherche AgronomiqueIn small molecule identification from tandem mass (MS/MS) spectra, input-output kernel regression (IOKR) currently provides the state-of-the-art combination of fast training and prediction and high identification rates. The IOKR approach can be simply understood as predicting a fingerprint vector from the MS/MS spectrum of the unknown molecule, and solving a pre-image problem to find the molecule with the most similar fingerprint. In this paper, we bring forward the following improvements to the IOKR framework: firstly, we formulate the IOKRreverse model that can be understood as mapping molecular structures into the MS/MS feature space and solving a pre-image problem to find the molecule whose predicted spectrum is the closest to the input MS/MS spectrum. Secondly, we introduce an approach to combine several IOKR and IOKRreverse models computed from different input and output kernels, called IOKRfusion. The method is based on minimizing structured Hinge loss of the combined model using a mini-batch stochastic subgradient optimization. Our experiments show a consistent improvement of top-k accuracy both in positive and negative ionization mode data.Item Input Output Kernel Regression(2016) Brouard, Celine; Szafranski, Marie; d'Alché-Buc, Florence; Department of Computer Science; Université d'Évry Val-d'Essonne; Telecom ParisTechIn this paper, we introduce a novel approach, called Input Output Kernel Regression (IOKR), for learning mappings between structured inputs and structured outputs. The approach belongs to the family of Output Kernel Regression methods devoted to regression in feature space endowed with some output kernel. In order to take into account structure in input data and benefit from kernels in the input space as well, we use the Reproducing Kernel Hilbert Space theory for vector-valued functions. We first recall the ridge solution for supervised learning and then study the regularized hinge loss-based solution used in Maximum Margin Regression. Both models are also developed in the context of semi-supervised setting. In addition we derive an extension of Generalized Cross Validation for model selection in the case of the least-square model. Finally we show the versatility of the IOKR framework on two different problems: link prediction seen as a structured output problem and multi-task regression seen as a multipleand interdependent output problem. Eventually, we present a set of detailed numerical results that shows the relevance of the method on these two tasks.Item Liquid-chromatography retention order prediction for metabolite identification(2018-09-01) Bach, Eric; Szedmak, Sandor; Brouard, Celine; Boecker, Sebastian; Rousu, Juho; Department of Computer Science; Professorship Rousu Juho; Helsinki Institute for Information Technology (HIIT); Friedrich Schiller University JenaMotivation: Liquid Chromatography (LC) followed by tandem Mass Spectrometry (MS/MS) is one of the predominant methods for metabolite identification. In recent years, machine learning has started to transform the analysis of tandem mass spectra and the identification of small molecules. In contrast, LC data is rarely used to improve metabolite identification, despite numerous published methods for retention time prediction using machine learning. Results: We present a machine learning method for predicting the retention order of molecules; that is, the order in which molecules elute from the LC column. Our method has important advantages over previous approaches: We show that retention order is much better conserved between instruments than retention time. To this end, our method can be trained using retention time measurements from different LC systems and configurations without tedious pre-processing, significantly increasing the amount of available training data. Our experiments demonstrate that retention order prediction is an effective way to learn retention behaviour of molecules from heterogeneous retention time data. Finally, we demonstrate how retention order prediction and MS/MS-based scores can be combined for more accurate metabolite identifications when analyzing a complete LC-MS/MS run.Item Magnitude-Preserving Ranking for Structured Outputs(PMLR, 2017-11-03) Brouard, Celine; Bach, Eric; Böcker, Sebastian; Rousu, Juho; Department of Computer Science; Zhang, Min-Ling; Noh, Yung-Kyun; Professorship Rousu Juho; Helsinki Institute for Information Technology (HIIT); Friedrich Schiller University JenaIn this paper, we present a novel method for solving structured prediction problems, based on combining Input Output Kernel Regression (IOKR) with an extension of magnitude-preserving ranking to structured output spaces. In particular, we concentrate on the case where a set of candidate outputs has been given, and the associated pre-image problem calls for ranking the set of candidate outputs. Our method, called magnitude-preserving IOKR, both aims to produce a good approximation of the output feature vectors, and to preserve the magnitude differences of the output features in the candidate sets. For the case where the candidate set does not contain corresponding ’correct’ inputs, we propose a method for approximating the inputs through application of IOKR in the reverse direction. We apply our method to two learning problems: cross-lingual document retrieval and metabolite identification. Experiments show that the proposed approach improves performance over IOKR, and in the latter application obtains the current state-of-the-art accuracy.