An Analysis of Errors and Discrepenices in Analyzing Single Cell RNA Sequence Data



Journal Title

Journal ISSN

Volume Title



Single-cell RNA sequencing (scRNA-seq) is an extremely vital sequencing technology that has enabled High-throughput mapping of cellular differentiation hierarchies. scRNA-seq has excellent sequencing potential with a wide range of applications beyond regular transcriptome profiling. scRNA-seq process involves analyzing data using 3' end counting technology, which involves sample composition and analytical processing including pre-processing, normalization, alignment and clustering. In order to accomplish this task bioinformaticians around the world have developed many computational tools. As of 2019, there exist 385 different tools that can be used to analyze scRNA-seq data, and that number is growing. Although this continuous addition of new features to single-cell data analysis confronts technical gaps with bulk RNA-seq, there have been very few attempts to standardize these practices. This study explores the various approaches to re-analyze previously published single cell RNA-cell sequencing data and discusses subsequent challenges to utilize publicly available data sets to conduct a multicenter study. Considering the differences in data publication formats, there are several methods that can be employed. 1) Analyzing BCL files 2) Analyzing FASTQ files 3) Analyzing matrix files 4) Analyzing Seurat or ScanPy objects. This thesis provides a concise overview of some of the steps, algorithms, and approaches that are currently used in the analysis of single-cell RNA-sequencing data, with an emphasis on recent developments. Hence, I propose that in order to develop reproducible algorithms and analysis software for scRNA-seq data sets, it is vital that standardization across all analysis platform exist and the software developers recognize and understand the computational challenges posed by the analysis tasks.



Ribonucleic acid (RNA)