Identifying Associations Between Host Genotype and Gut Microbiota Using Statistical and Computational Models

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
Sähkötekniikan korkeakoulu | Master's thesis
Bioinformatiikka ja Laskennallinen Systeemibiologia
Degree programme
BIO - Bioinformaatioteknologia
The human gut microbiota is highly variable from one person to another, but many studies have been conducted to examine as to what extent host genetics control the composition. Understanding how the gut microbiota is assembled and associated to the host genotype can be relevant in the treatment of chronic complex diseases, such as inflammatory bowel disease (IBD) and diabetes. Analyses through candidate gene approaches, where one gene is deleted from or added to a model host organism, have shown tremendous effect of a single host gene on the diversity and population structure of the gut microbiota. In contrast to the candidate gene approach, the aim of this study is to assess these genotypic associations on a large-scale in human. For 71 healthy Finnish individuals, the host genomics (from blood derived DNA) was analyzed using the Illumina Immunochip SNP genotyping platform. The bacterial composition of the gut (from faecal samples) was extracted applying barcoded pyrosequencing to the V1-V3 region of 16s RNA genes and binned into operational taxonomic units (OTUs). To find associations between the host genotype and its corresponding gut bacterial composition, various statistical and computational techniques were employed. In particular, random forests, pair-wise linear regression modeling and one-way analysis of variance (ANOVA) were opted for. Furthermore, several dimension reduction methods such as principal component analysis (PCA), diversity indices and haplotypic blocking, were adopted to reduce dependencies and noise within both the genotype as well as bacterial data. By applying the diverse set of tools, a number of SNPs from host genotype were found to be at least weakly associated to the gut microbiota. These so-called ‘associative’ SNPs were subsequently mapped to their closest genes and then carried through pathway and gene ontology enrichment analysis by adjusting the reference gene set according to the design of the Immunochip. As a result, the detected pathways and ontologies, which were either strongly or weakly enriched include, among others, immune response, activation, differentiation and proliferation of T cells, lymphocytes, leukocytes, and a few other key players of the immune system.
Lähdesmäki, Harri
Thesis advisor
Lähdesmäki, Harri
bioinformatics, metagenomics, associations, host genotype, gut microbiota
Other note