SuperDCA for genome-wide epistasis analysis

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorPuranen, Santeri
dc.contributor.authorPesonen, Maiju
dc.contributor.authorPensar, Johan
dc.contributor.authorXu, Yingying
dc.contributor.authorLees, John A.
dc.contributor.authorBentley, Stephen
dc.contributor.authorCroucher, Nicholas J
dc.contributor.authorCorander, Jukka
dc.contributor.departmentHelsinki Institute for Information Technology (HIIT)
dc.contributor.departmentCentre of Excellence in Computational Inference, COIN
dc.contributor.departmentUniversity of Helsinki
dc.contributor.departmentDepartment of Computer Science
dc.contributor.departmentNew York University
dc.contributor.departmentWellcome Trust Sanger Institute
dc.contributor.departmentImperial College London
dc.date.accessioned2020-02-03T09:03:55Z
dc.date.available2020-02-03T09:03:55Z
dc.date.issued2018-05-29
dc.description.abstractThe potential for genome-wide modelling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has previously been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 104–105 polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here, we introduce a novel inference method (SuperDCA) that employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 105 polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA, thus, holds considerable potential in building understanding about numerous organisms at a systems biological level.en
dc.description.versionPeer revieweden
dc.format.mimetypeapplication/pdf
dc.identifier.citationPuranen , S , Pesonen , M , Pensar , J , Xu , Y , Lees , J A , Bentley , S , Croucher , N J & Corander , J 2018 , ' SuperDCA for genome-wide epistasis analysis ' , Microbial Genomics , vol. 4 , no. 6 . https://doi.org/10.1099/mgen.0.000184en
dc.identifier.doi10.1099/mgen.0.000184
dc.identifier.issn2057-5858
dc.identifier.otherPURE UUID: fc897d05-1abf-4037-9f2b-0271700cecb6
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/fc897d05-1abf-4037-9f2b-0271700cecb6
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/40402323/Puranen_et.al_SuperDCA.mgen000184_1.pdf
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/42973
dc.identifier.urnURN:NBN:fi:aalto-202002032053
dc.language.isoenen
dc.publisherMicrobiology Society
dc.relation.ispartofseriesMicrobial Genomicsen
dc.relation.ispartofseriesVolume 4, issue 6en
dc.rightsopenAccessen
dc.subject.keywordepistasis
dc.subject.keywordlinkage disequilibrium
dc.subject.keywordpopulation genomics
dc.titleSuperDCA for genome-wide epistasis analysisen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionpublishedVersion
Files