[dipl] Perustieteiden korkeakoulu / SCI
Permanent URI for this collectionhttps://aaltodoc.aalto.fi/handle/123456789/21
Browse
Browsing [dipl] Perustieteiden korkeakoulu / SCI by Subject "16S rRNA gene sequencing"
Now showing 1 - 1 of 1
- Results Per Page
- Sort Options
- Comparison of normalization and statistical testing methods of 16S rRNA gene sequencing data
Perustieteiden korkeakoulu | Master's thesis(2018-12-10) Lehtinen, IlonaThe decreasing cost and increasing speed of next-generation sequencing techniques now enable more affordable and time effective access to human microbiomes. The aim of many 16S ribosomal RNA (rRNA) gene sequencing experiments is to identify the taxa significantly differing in the abundance between two or more conditions. However, increasing awareness about the compositional nature of the 16S rRNA gene sequencing data has evoked concerns about the validity of conclusions drawn from this type of data. Many early differential abundance testing methods completely ignore the compositionality or uneven library sizes. Recently, new methods taking the compositionality into account have been developed with the aim to ensure scale invariance and sub-compositional coherence. However, the constitutive problem of compositional data not containing the information needed for differential abundance testing remains a major challenge. The aim of this thesis was to evaluate different methods used for differential abundance testing for 16S rRNA gene sequencing data using both simulated and real data. Overall, we found that the simulation results are very dependent upon the simulation design and data characteristics. We confirm that better detection performance was achieved with bigger effect size and when more samples were available. The experiment performed on real data revealed that big differences between the methods still appear. Centered log-ratio (CLR) transformation prior to statistical tests produced the highest detection accuracy in our simulation experiments. CLR transformation in combination with Reproducibility-Optimized Test Statistic (ROTS) or Wilcoxon rank sum test produced nearly equal results on bigger sample sizes. However, on small sample sizes ROTS outperformed Wilcoxon rank sum test. Thus, based on our results, the use of CLR transformation combined with ROTS statistical test can be encouraged for the differential abundance testing on 16S rRNA gene sequencing data.