Computational Comparative Study of blood TCR repertoire: Celiac Disease patients versus Controls

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
School of Science | Master's thesis
Checking the digitized thesis and permission for publishing
Instructions for the author
Degree programme
[10] + 67
This thesis presents a combination of computational methods applied in deep sequenced, PBMC (peripheral blood mononuclear cells) T-cell receptor repertoire comparison of celiac disease patients versus healthy controls. The objective of the study is to assess the repertoires and mine for signatures in the TCR CDR3 regions that explain the disease manifestations in celiac disease patients by identifying and defining those sequences that are enriched as a result of exposure to oral wheat gluten challenge. Data is available from both celiac disease patients and healthy controls pre-and post-wheat gluten challenge. In order to observe the general impact of the gluten challenge, repertoire diversity comparison has been performed across time points (before vs. after challenge) and subject wise by assigning diversity index values for each repertoire. The repertoires do not show significant difference in diversity across time. However, a significant overlap in sequences is seen among post-challenge samples than among pre-challenge sample sequences. V gene usage comparisons of each sample against a baseline repertoire show our samples to have a significantly different V gene usage. Across time point, both groups show a significant change in V gene usage distribution. Specifically, genes V7-2, V7-3, V7-9 and V5-1 show a significantly increased usage in healthy controls post-challenge. Enriched sequences that are found in both time points were collected from all samples. Sequences with four fold increase post-challenge were then hierarchically clustered based on a modified normalized edit distance from each other. Amino acid pattern (motif) search was performed for each cluster using TEIRESIAS. The cluster with the highest mean of fold changes in enrichment is assumed to contain the most likely sequences that represent the immune response to the gluten challenge.
Lähdesmäki, Harri
Thesis advisor
Saavalainen, Päivi
hierarchical clustering, pattern recognition, TCR repertoire, V gene usage, celiac disease
Other note