Browsing by Author "Nguyen, Dai Hai"
Now showing 1 - 5 of 5
- Results Per Page
- Sort Options
- ADAPTIVE: LeArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2019-07-15) Nguyen, Dai Hai; Nguyen, Canh Hao; Mamitsuka, HiroshiMotivation: Metabolite identification is an important task in metabolomics to enhance the knowledge of biological systems. There have been a number of machine learning-based methods proposed for this task, which predict a chemical structure of a given spectrum through an intermediate (chemical structure) representation called molecular fingerprints. They usually have two steps: (i) predicting fingerprints from spectra; (ii) searching chemical compounds (in database) corresponding to the predicted fingerprints. Fingerprints are feature vectors, which are usually very large to cover all possible substructures and chemical properties, and therefore heavily redundant, in the sense of having many molecular (sub)structures irrelevant to the task, causing limited predictive performance and slow prediction. Results: We propose ADAPTIVE, which has two parts: learning two mappings (i) from structures to molecular vectors and (ii) from spectra to molecular vectors. The first part learns molecular vectors for metabolites from given data, to be consistent with both spectra and chemical structures of metabolites. In more detail, molecular vectors are generated by a model, being parameterized by a message passing neural network, and parameters are estimated by maximizing the correlation between molecular vectors and the corresponding spectra in terms of Hilbert-Schmidt Independence Criterion. Molecular vectors generated by this model are compact and importantly adaptive (specific) to both given data and task of metabolite identification. The second part uses input output kernel regression (IOKR), the current cutting-edge method of metabolite identification. We empirically confirmed the effectiveness of ADAPTIVE by using a benchmark data, where ADAPTIVE outperformed the original IOKR in both predictive performance and computational efficiency. - Learning subtree pattern importance for Weisfeiler-Lehman based graph kernels
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2021-07) Nguyen, Dai Hai; Nguyen, Canh Hao; Mamitsuka, HiroshiGraph is an usual representation of relational data, which are ubiquitous in many domains such as molecules, biological and social networks. A popular approach to learning with graph structured data is to make use of graph kernels, which measure the similarity between graphs and are plugged into a kernel machine such as a support vector machine. Weisfeiler-Lehman (WL) based graph kernels, which employ WL labeling scheme to extract subtree patterns and perform node embedding, are demonstrated to achieve great performance while being efficiently computable. However, one of the main drawbacks of a general kernel is the decoupling of kernel construction and learning process. For molecular graphs, usual kernels such as WL subtree, based on substructures of the molecules, consider all available substructures having the same importance, which might not be suitable in practice. In this paper, we propose a method to learn the weights of subtree patterns in the framework of WWL kernels, the state of the art method for graph classification task (Togninalli et al., in: Advances in Neural Information Processing Systems, pp. 6439–6449, 2019). To overcome the computational issue on large scale data sets, we present an efficient learning algorithm and also derive a generalization gap bound to show its convergence. Finally, through experiments on synthetic and real-world data sets, we demonstrate the effectiveness of our proposed method for learning the weights of subtree patterns. - Machine Learning for Metabolic Identification
A3 Kirjan tai muun kokoomateoksen osa(2021) Nguyen, Dai Hai; Nguyen, Canh Hao; Mamitsuka, HiroshiMetabolic identification is an essential part of metabolomics to understand biochemical characteristics of metabolites, which are small molecules that play important functions in biological systems. However, this field remains challenging with many unknown metabolites in existence. Mass spectrometry (MS) is a common technology that deals with such small molecules. Over recent decades, many methods have been proposed for MS-based metabolite identification, but machine learning has been a key process in recent progress in metabolite identification. This chapter provides a survey on computational methods for metabolic identification with the focus on machine learning, with a discussion on potential improvements for this task. - Recent advances and prospects of computational methods for metabolite identification: a review with emphasis on machine learning approaches
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2019-11-27) Nguyen, Dai Hai; Nguyen, Canh Hao; Mamitsuka, HiroshiMOTIVATION: Metabolomics involves studies of a great number of metabolites, which are small molecules present in biological systems. They play a lot of important functions such as energy transport, signaling, building block of cells and inhibition/catalysis. Understanding biochemical characteristics of the metabolites is an essential and significant part of metabolomics to enlarge the knowledge of biological systems. It is also the key to the development of many applications and areas such as biotechnology, biomedicine or pharmaceuticals. However, the identification of the metabolites remains a challenging task in metabolomics with a huge number of potentially interesting but unknown metabolites. The standard method for identifying metabolites is based on the mass spectrometry (MS) preceded by a separation technique. Over many decades, many techniques with different approaches have been proposed for MS-based metabolite identification task, which can be divided into the following four groups: mass spectra database, in silico fragmentation, fragmentation tree and machine learning. In this review paper, we thoroughly survey currently available tools for metabolite identification with the focus on in silico fragmentation, and machine learning-based approaches. We also give an intensive discussion on advanced machine learning methods, which can lead to further improvement on this task. - SIMPLE Sparse Interaction Model over Peaks of moLEcules for fast, interpretable metabolite identification from tandem mass spectra
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2018-07-01) Nguyen, Dai Hai; Nguyen, Canh Hao; Mamitsuka, HiroshiMotivation: Recent success in metabolite identification from tandem mass spectra has been led by machine learning, which has two stages: mapping mass spectra to molecular fingerprint vectors and then retrieving candidate molecules from the database. In the first stage, i.e. fingerprint prediction, spectrum peaks are features and considering their interactions would be reasonable for more accurate identification of unknown metabolites. Existing approaches of fingerprint prediction are based on only individual peaks in the spectra, without explicitly considering the peak interactions. Also the current cutting-edge method is based on kernels, which are computationally heavy and difficult to interpret. Results: We propose two learning models that allow to incorporate peak interactions for fingerprint prediction. First, we extend the state-of-the-art kernel learning method by developing kernels for peak interactions to combine with kernels for peaks through multiple kernel learning (MKL). Second, we formulate a sparse interaction model for metabolite peaks, which we call SIMPLE, which is computationally light and interpretable for fingerprint prediction. The formulation of SIMPLE is convex and guarantees global optimization, for which we develop an alternating direction method of multipliers (ADMM) algorithm. Experiments using the MassBank dataset show that both models achieved comparative prediction accuracy with the current top-performance kernel method. Furthermore SIMPLE clearly revealed individual peaks and peak interactions which contribute to enhancing the performance of fingerprint prediction.