Correlation-compressed direct-coupling analysis

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en Gao, Chen Yi Zhou, Hai Jun Aurell, Erik 2018-10-16T08:55:28Z 2018-10-16T08:55:28Z 2018-09-11
dc.identifier.citation Gao , C Y , Zhou , H J & Aurell , E 2018 , ' Correlation-compressed direct-coupling analysis ' Physical Review E , vol 98 , no. 3 , 032407 , pp. 1-15 . DOI: 10.1103/PhysRevE.98.032407 en
dc.identifier.issn 2470-0045
dc.identifier.issn 1550-2376
dc.identifier.other PURE UUID: 87a87ad8-4c93-4e61-9b9f-bf8d6c9071f1
dc.identifier.other PURE ITEMURL:
dc.identifier.other PURE LINK:
dc.identifier.other PURE FILEURL:
dc.description.abstract Learning Ising or Potts models from data has become an important topic in statistical physics and computational biology, with applications to predictions of structural contacts in proteins and other areas of biological data analysis. The corresponding inference problems are challenging since the normalization constant (partition function) of the Ising or Potts distribution cannot be computed efficiently on large instances. Different ways to address this issue have resulted in a substantial amount of methodological literature. In this paper we investigate how these methods could be used on much larger data sets than studied previously. We focus on a central aspect, that in practice these inference problems are almost always severely undersampled, and the operational result is almost always a small set of leading predictions. We therefore explore an approach where the data are prefiltered based on empirical correlations, which can be computed directly even for very large problems. Inference is only used on the much smaller instance in a subsequent step of the analysis. We show that in several relevant model classes such a combined approach gives results of almost the same quality as inference on the whole data set. It can therefore provide a potentially very large computational speedup at the price of only marginal decrease in prediction quality. We also show that the results on whole-genome epistatic couplings that were obtained in a recent computation-intensive study can be retrieved by our approach. The method of this paper hence opens up the possibility to learn parameters describing pairwise dependences among whole genomes in a computationally feasible and expedient manner. en
dc.format.extent 1-15
dc.format.mimetype application/pdf
dc.language.iso en en
dc.relation.ispartofseries Physical Review E en
dc.relation.ispartofseries Volume 98, issue 3 en
dc.rights openAccess en
dc.subject.other Statistical and Nonlinear Physics en
dc.subject.other Statistics and Probability en
dc.subject.other Condensed Matter Physics en
dc.subject.other 114 Physical sciences en
dc.title Correlation-compressed direct-coupling analysis en
dc.type A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä fi
dc.description.version Peer reviewed en
dc.contributor.department CAS - Institute of Theoretical Physics
dc.contributor.department Department of Applied Physics
dc.subject.keyword Statistical and Nonlinear Physics
dc.subject.keyword Statistics and Probability
dc.subject.keyword Condensed Matter Physics
dc.subject.keyword 114 Physical sciences
dc.identifier.urn URN:NBN:fi:aalto-201810165374
dc.identifier.doi 10.1103/PhysRevE.98.032407
dc.type.version publishedVersion

Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive

Advanced Search

article-iconSubmit a publication


My Account