A Dirichlet-Multinomial Mixture Model For Clustering Heterogeneous Epigenomics Data

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2014-09-29

Department

Major/Subject

Computational Systems Biology

Mcode

IL3013

Degree programme

Master's Degree Programme in Computational and Systems Biology (euSYSBIO)

Language

en

Pages

72 + 6

Series

Abstract

Epigenetic information sheds light on essential biological mechanisms including the regulation of gene expression. Among the major epigenetic mechanisms are histone tail modifications which can be utilized to identify cis-regulatory elements such as promoters and enhancers. Nucleosome positions and open chromatin regions are other key elements of the epigenomic landscape. Thanks to the advances in high-throughput sequencing technologies, comprehensive genome-wide analyses of epigenetic signatures are possible at present. Despite the growing number of epigenetic datasets, the tools to discover novel patterns and combinatorial presence of epigenetic elements are still needed. In this thesis, we introduce a model-based clustering approach that uncovers epigenetic patterns by integrating multiple data tracks in a multi-view fashion where different views correspond to different epigenetic signals extracted from the same genomic location. Moreover, to address the inaccuracy of the positions of anchor points, such as TF ChIP-seq peak summits or TSS, a profile shifting feature is implemented. Finally, owing to the hyperprior regularization, our approach can also account for the correlation between the number of reads mapped to consecutive base pair positions. We demonstrate that the genome-wide clustering of promoter and enhancer regions in human genome reveals distinct patterns in various histone modification and transcription factor ChIP-seq profiles. Furthermore, TFBS enrichment in different classes of enhancers and promoters that are identified by our method is investigated which shows that some transcription factors are significantly enriched in a subset of enhancer and promoter clusters.

Description

Supervisor

Lähdesmäki, Harri

Thesis advisor

Osmala, Maria

Keywords

chromatin, enhancers, promoters, multi-view clustering, histone modifications, epigenomics, generative models, Dirichlet-multinomial, mixture model

Other note

Citation