StrainMiner - Data mining and discrete optimizations for strains separation in metagenomes using long reads
Loading...
URL
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2023-10-09
Department
Major/Subject
Data Science
Mcode
SCI3115
Degree programme
Master's Programme in ICT Innovation
Language
en
Pages
58+10
Series
Abstract
This study presents StrainMiner, an approach that combines data mining and discrete optimization techniques for strain separation in microbial communities. Accurately characterizing the genetic diversity and functional potential of microbial populations in metagenomic samples relies on accurate strain separation. StrainMiner utilizes biclustering to identify cohesive genetic features that can distinguish strains by clustering rows (DNA sequences) and columns (DNA positions). The algorithm employs hierarchical clustering and k-nearest neighbors imputation for data preparation. Furthermore, a linear integer programming model is used to search for the maximum quasi-bicliques in order to obtain optimal bipartitions. Experimental evaluations on simulated and real-world metagenomic data demonstrate StrainMiner's ability to accurately separate strains, even in datasets with high noisiness and high number of strains. StrainMiner is an early version, and future integration with HairSplitter, an end-to-end tool for strains separation currently in development at IRISA, is planned.).Description
Supervisor
Lähdesmäki, HarriThesis advisor
Andonov, RumenKeywords
quasi-biclique, K-nearest neighbor imputation, hierarchical clustering, integer linear programing, metagenomics, strains separation