Browsing by Author "Pensar, Johan"
Now showing 1 - 9 of 9
- Results Per Page
- Sort Options
- Genome-wide epistasis and co-selection study using mutual information
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2019-10-10) Pensar, Johan; Puranen, Santeri; Arnold, Brian; MacAlasdair, Neil; Kuronen, Juri; Tonkin-Hill, Gerry; Pesonen, Maiju; Xu, Yingying; Sipola, Aleksi; Sánchez-Busó, Leonor; Lees, John A.; Chewapreecha, Claire; Bentley, Stephen D.; Harris, Simon R.; Parkhill, Julian; Croucher, Nicholas J.; Corander, JukkaCovariance-based discovery of polymorphisms under co-selective pressure or epistasis has received considerable recent attention in population genomics. Both statistical modeling of the population level covariation of alleles across the chromosome and model-free testing of dependencies between pairs of polymorphisms have been shown to successfully uncover patterns of selection in bacterial populations. Here we introduce a model-free method, SpydrPick, whose computational efficiency enables analysis at the scale of pan-genomes of many bacteria. SpydrPick incorporates an efficient correction for population structure, which adjusts for the phylogenetic signal in the data without requiring an explicit phylogenetic tree. We also introduce a new type of visualization of the results similar to the Manhattan plots used in genome-wide association studies, which enables rapid exploration of the identified signals of co-evolution. Simulations demonstrate the usefulness of our method and give some insight to when this type of analysis is most likely to be successful. Application of the method to large population genomic datasets of two major human pathogens, Streptococcus pneumoniae and Neisseria meningitidis, revealed both previously identified and novel putative targets of co-selection related to virulence and antibiotic resistance, highlighting the potential of this approach to drive molecular discoveries, even in the absence of phenotypic data. - Genomic rearrangements uncovered by genome-wide co-evolution analysis of a major nosocomial pathogen, Enterococcus faecium
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2020-12-01) Top, Janetta; Arredondo-Alonso, Sergio; Schürch, Anita C.; Puranen, Santeri; Pesonen, Maiju; Pensar, Johan; Willems, Rob J.L.; Corander, JukkaEnterococcus faecium is a gut commensal of the gastro-digestive tract, but also known as nosocomial pathogen among hospitalized patients. Population genetics based on whole-genome sequencing has revealed that E. faecium strains from hospitalized patients form a distinct clade, designated clade A1, and that plasmids are major contributors to the emergence of nosocomial E. faecium. Here we further explored the adaptive evolution of E. faecium using a genome-wide co-evolution study (GWES) to identify co-evolving single-nucleotide polymorphisms (SNPs). We identified three genomic regions harbouring large numbers of SNPs in tight linkage that are not proximal to each other based on the completely assembled chromosome of the clade A1 reference hospital isolate AUS0004. Close examination of these regions revealed that they are located at the borders of four different types of large-scale genomic rearrangements, insertion sites of two different genomic islands and an IS30-like transposon. In non-clade A1 isolates, these regions are adjacent to each other and they lack the insertions of the genomic islands and IS30-like transposon. Additionally, among the clade A1 isolates there is one group of pet isolates lacking the genomic rearrangement and insertion of the genomic islands, suggesting a distinct evolutionary trajectory. In silico analysis of the biological functions of the genes encoded in three regions revealed a common link to a stress response. This suggests that these rearrangements may reflect adaptation to the stringent conditions in the hospital environment, such as antibiotics and detergents, to which bacteria are exposed. In conclusion, to our knowledge, this is the first study using GWES to identify genomic rearrangements, suggesting that there is considerable untapped potential to unravel hidden evolutionary signals from population genomic data. - High-dimensional structure learning of sparse vector autoregressive models using fractional marginal pseudo-likelihood
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2021-11) Suotsalo, Kimmo; Xu, Yingying; Corander, Jukka; Pensar, JohanLearning vector autoregressive models from multivariate time series is conventionally approached through least squares or maximum likelihood estimation. These methods typically assume a fully connected model which provides no direct insight to the model structure and may lead to highly noisy estimates of the parameters. Because of these limitations, there has been an increasing interest towards methods that produce sparse estimates through penalized regression. However, such methods are computationally intensive and may become prohibitively time-consuming when the number of variables in the model increases. In this paper we adopt an approximate Bayesian approach to the learning problem by combining fractional marginal likelihood and pseudo-likelihood. We propose a novel method, PLVAR, that is both faster and produces more accurate estimates than the state-of-the-art methods based on penalized regression. We prove the consistency of the PLVAR estimator and demonstrate the attractive performance of the method on both simulated and real-world data. - Learning discrete decomposable graphical models via constraint optimization
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2017) Janhunen, Tomi; Gebser, Martin; Rintanen, Jussi; Nyman, Henrik; Pensar, Johan; Corander, JukkaStatistical model learning problems are traditionally solved using either heuristic greedy optimization or stochastic simulation, such as Markov chain Monte Carlo or simulated annealing. Recently, there has been an increasing interest in the use of combinatorial search methods, including those based on computational logic. Some of these methods are particularly attractive since they can also be successful in proving the global optimality of solutions, in contrast to stochastic algorithms that only guarantee optimality at the limit. Here we improve and generalize a recently introduced constraint-based method for learning undirected graphical models. The new method combines perfect elimination orderings with various strategies for solution pruning and offers a dramatic improvement both in terms of time and memory complexity. We also show that the method is capable of efficiently handling a more general class of models, called stratified/labeled graphical models, which have an astronomically larger model space. - Medelvärdesmodell av fyrtakts dieselmotorer för kraftverksinstallation
Helsinki University of Technology | Master's thesis(1996) Engblom, Kenneth - Plasmids shaped the recent emergence of the major nosocomial pathogen Enterococcus faecium
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2020-01-01) Arredondo-Alonso, Sergio; Top, Janetta; McNally, Alan; Puranen, Santeri; Pesonen, Maiju; Pensar, Johan; Marttinen, Pekka; Braat, Johanna; Rogers, Malbert; van Schaik, Willem; Kaski, Samuel; Willems, Rob J L; Corander, Jukka; Schürch, AnitaEnterococcus faecium is a gut commensal of humans and animals but is also listed on the WHO global priority list of multidrug-resistant pathogens. Many of its antibiotic resistance traits reside on plasmids and have the potential to be disseminated by horizontal gene transfer. Here, we present the first comprehensive population-wide analysis of the pan-plasmidome of a clinically important bacterium, by whole-genome sequence analysis of 1,644 isolates from hospital, commensal, and animal sources of E. faecium. Long-read sequencing on a selection of isolates resulted in the completion of 305 plasmids that exhibited high levels of sequence modularity. We further investigated the entirety of all plasmids of each isolate (plas-midome) using a combination of short-read sequencing and machine-learning classi-fiers. Clustering of the plasmid sequences unraveled different E. faecium populations with a clear association with hospitalized patient isolates, suggesting different optimal configurations of plasmids in the hospital environment. The characterization of these populations allowed us to identify common mechanisms of plasmid stabilization such as toxin-antitoxin systems and genes exclusively present in particular plasmidome populations exemplified by copper resistance, phosphotransferase systems, or bacteriocin genes potentially involved in niche adaptation. Based on the distribution of k-mer distances between isolates, we concluded that plasmidomes rather than chromosomes are most informative for source specificity of E. faecium. IMPORTANCE Enterococcus faecium is one of the most frequent nosocomial pathogens of hospital-acquired infections. E. faecium has gained resistance against most commonly available antibiotics, most notably, against ampicillin, gentamicin, and vancomycin, which renders infections difficult to treat. Many antibiotic resistance traits, in particular, vancomycin resistance, can be encoded in autonomous and extrachromosomal elements called plasmids. These sequences can be disseminated to other isolates by horizontal gene transfer and confer novel mechanisms to source specificity. In our study, we elucidated the total plasmid content, referred to as the plasmidome, of 1,644 E. faecium isolates by using short-and long-read whole-genome technologies with the combination of a machine-learning classifier. This was fundamental to investigate the full collection of plasmid sequences present in our collection (pan-plasmidome) and to observe the potential transfer of plasmid sequences between E. faecium hosts. We observed that E. faecium isolates from hospitalized patients carried a larger number of plasmid sequences compared to that from other sources, and they elucidated different configurations of plasmidome populations in the hospital environment. We assessed the contribution of different genomic components and observed that plasmid sequences have the highest contribution to source specificity. Our study suggests that E. faecium plasmids are regulated by complex ecological constraints rather than physical interaction between hosts. - Semiautomatic Maintenance Assistance on a Diesel/Gas engine
Helsinki University of Technology | Master's thesis(2005) Rösgren, JonatanThe purpose of this thesis is to study different kinds of control system based maintenance tasks on a Wärtsilä diesel/gas engine and to investigate how such tasks could be semi-automated. The results of the study were used to implement a prototype, which is able to import and execute these semi-automated tasks. A typical task is defined as a procedure that assists the operator in performing local and discrete service work. Four main categories of tasks were identified in the maintenance information system hierarchy: instructions, tests, tunings and analyses. The study further acknowledges similar tasks throughout the extended product life cycle. The result discerns a wide area of different procedures with varying development requirements. The study argues, however, that if a flexible tool were available to handle all these tasks, it could offer excellent comprehensive testing, tuning and analysis opportunities for both customers and in-house engineers at every stage in the product life cycle. This range of tasks was used as a basis for the prototype tool. The technical study reviews different software component techniques and tools for task development. COM techniques are covered in depth, along with high-level development means such as MATLAB, script languages and XML techniques. The prototype solution turned out to be based, somewhat surprisingly, on the Internet Explorer web browser control used in a local environment. The implemented prototype offers a programming interface to the control system for task development, with the web control used as an embedded execution host. The developed prototype proved itself to be a cost-effective and flexible solution for the tasks in question. The prototype fulfils its stated requirements, and is hence recommended for official implementation. - SuperDCA for genome-wide epistasis analysis
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2018-05-29) Puranen, Santeri; Pesonen, Maiju; Pensar, Johan; Xu, Yingying; Lees, John A.; Bentley, Stephen; Croucher, Nicholas J; Corander, JukkaThe potential for genome-wide modelling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has previously been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 104–105 polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here, we introduce a novel inference method (SuperDCA) that employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 105 polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA, thus, holds considerable potential in building understanding about numerous organisms at a systems biological level. - Uniform Engine Speed-Load Control
Helsinki University of Technology | Master's thesis(2005) Saikkonen, Ari