Browsing by Author "Lees, John A."
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
- Genome-wide epistasis and co-selection study using mutual information
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2019-10-10) Pensar, Johan; Puranen, Santeri; Arnold, Brian; MacAlasdair, Neil; Kuronen, Juri; Tonkin-Hill, Gerry; Pesonen, Maiju; Xu, Yingying; Sipola, Aleksi; Sánchez-Busó, Leonor; Lees, John A.; Chewapreecha, Claire; Bentley, Stephen D.; Harris, Simon R.; Parkhill, Julian; Croucher, Nicholas J.; Corander, JukkaCovariance-based discovery of polymorphisms under co-selective pressure or epistasis has received considerable recent attention in population genomics. Both statistical modeling of the population level covariation of alleles across the chromosome and model-free testing of dependencies between pairs of polymorphisms have been shown to successfully uncover patterns of selection in bacterial populations. Here we introduce a model-free method, SpydrPick, whose computational efficiency enables analysis at the scale of pan-genomes of many bacteria. SpydrPick incorporates an efficient correction for population structure, which adjusts for the phylogenetic signal in the data without requiring an explicit phylogenetic tree. We also introduce a new type of visualization of the results similar to the Manhattan plots used in genome-wide association studies, which enables rapid exploration of the identified signals of co-evolution. Simulations demonstrate the usefulness of our method and give some insight to when this type of analysis is most likely to be successful. Application of the method to large population genomic datasets of two major human pathogens, Streptococcus pneumoniae and Neisseria meningitidis, revealed both previously identified and novel putative targets of co-selection related to virulence and antibiotic resistance, highlighting the potential of this approach to drive molecular discoveries, even in the absence of phenotypic data. - Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2022-10-10) Lees, John A.; Tonkin-Hill, Gerry; Yang, Zhirong; Corander, JukkaIn less than a decade, population genomics of microbes has progressed from the effort of sequencing dozens of strains to thousands, or even tens of thousands of strains in a single study. There are now hundreds of thousands of genomes available even for a single bacterial species, and the number of genomes is expected to continue to increase at an accelerated pace given the advances in sequencing technology and widespread genomic surveillance initiatives. This explosion of data calls for innovative methods to enable rapid exploration of the structure of a population based on different data modalities, such as multiple sequence alignments, assemblies and estimates of gene content across different genomes. Here, we present Mandrake, an efficient implementation of a dimensional reduction method tailored for the needs of large-scale population genomics. Mandrake is capable of visualizing population structure from millions of whole genomes, and we illustrate its usefulness with several datasets representing major pathogens. Our method is freely available both as an analysis pipeline (https://github.com/johnlees/mandrake) and as a browser-based interactive application (https://gtonkinhill.github.io/mandrake-web/). This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'. - Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2016-09-16) Lees, John A.; Vehkala, Minna; Välimäki, Niko; Harris, Simon R.; Chewapreecha, Claire; Croucher, Nicholas J.; Marttinen, Pekka; Davies, Mark R.; Steer, Andrew C.; Tong, Steven Y C; Honkela, Antti; Parkhill, Julian; Bentley, Stephen D.; Corander, JukkaBacterial genomes vary extensively in terms of both gene content and gene sequence. This plasticity hampers the use of traditional SNP-based methods for identifying all genetic associations with phenotypic variation. Here we introduce a computationally scalable and widely applicable statistical method (SEER) for the identification of sequence elements that are significantly enriched in a phenotype of interest. SEER is applicable to tens of thousands of genomes by counting variable-length k-mers using a distributed string-mining algorithm. Robust options are provided for association analysis that also correct for the clonal population structure of bacteria. Using large collections of genomes of the major human pathogens Streptococcus pneumoniae and Streptococcus pyogenes, SEER identifies relevant previously characterized resistance determinants for several antibiotics and discovers potential novel factors related to the invasiveness of S. pyogenes. We thus demonstrate that our method can answer important biologically and medically relevant questions. - SuperDCA for genome-wide epistasis analysis
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2018-05-29) Puranen, Santeri; Pesonen, Maiju; Pensar, Johan; Xu, Yingying; Lees, John A.; Bentley, Stephen; Croucher, Nicholas J; Corander, JukkaThe potential for genome-wide modelling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has previously been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 104–105 polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here, we introduce a novel inference method (SuperDCA) that employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 105 polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA, thus, holds considerable potential in building understanding about numerous organisms at a systems biological level.