Abstract:
In gene expression studies, Bulk RNA-sequencing (bulk RNA-seq) is an attractive alternative to single-cell RNA-sequencing (scRNA-seq) when single-cell resolution is not required. However, the cell type composition of bulk RNA-seq samples is often unknown, which may lead to inaccuracies in the analysis. This thesis proposes DeconV, a probabilistic cell type deconvolution method that utilizes scRNA-seq as a reference to infer cell type proportions from bulk RNA-seq. The performance of DeconV is evaluated using three datasets and compared against three popular state-of-the-art methods from the literature, namely CIBERSORTx [1], MuSiC [2], and Scaden [3]. Furthermore, the impact of technical factors, such as the number of genes and gene expression normalization, on the deconvolution results is assessed. DeconV achieves comparable accuracy to the best performing method (Scaden) while improving the interpretability of the model and results.