A Dirichlet-Multinomial Mixture Model For Clustering Heterogeneous Epigenomics Data

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Date
2014-09-29
Department
Major/Subject
Computational Systems Biology
Mcode
IL3013
Degree programme
Master's Degree Programme in Computational and Systems Biology (euSYSBIO)
Language
en
Pages
72 + 6
Series
Abstract
Epigenetic information sheds light on essential biological mechanisms including the regulation of gene expression. Among the major epigenetic mechanisms are histone tail modifications which can be utilized to identify cis-regulatory elements such as promoters and enhancers. Nucleosome positions and open chromatin regions are other key elements of the epigenomic landscape. Thanks to the advances in high-throughput sequencing technologies, comprehensive genome-wide analyses of epigenetic signatures are possible at present. Despite the growing number of epigenetic datasets, the tools to discover novel patterns and combinatorial presence of epigenetic elements are still needed. In this thesis, we introduce a model-based clustering approach that uncovers epigenetic patterns by integrating multiple data tracks in a multi-view fashion where different views correspond to different epigenetic signals extracted from the same genomic location. Moreover, to address the inaccuracy of the positions of anchor points, such as TF ChIP-seq peak summits or TSS, a profile shifting feature is implemented. Finally, owing to the hyperprior regularization, our approach can also account for the correlation between the number of reads mapped to consecutive base pair positions. We demonstrate that the genome-wide clustering of promoter and enhancer regions in human genome reveals distinct patterns in various histone modification and transcription factor ChIP-seq profiles. Furthermore, TFBS enrichment in different classes of enhancers and promoters that are identified by our method is investigated which shows that some transcription factors are significantly enriched in a subset of enhancer and promoter clusters.
Description
Supervisor
Lähdesmäki, Harri
Thesis advisor
Osmala, Maria
Keywords
chromatin, enhancers, promoters, multi-view clustering, histone modifications, epigenomics, generative models, Dirichlet-multinomial, mixture model
Other note
Citation