Omics-Scale Bioinformatics Technology and Methods: from Data to Information
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Omics-scale bioinformatics is an emerging discipline of science that plays an essential role in analyzing and interpreting large scale biological data. In this thesis, I developed three different omics-scale bioinformatics methods to facilitate the studies of human miRNAs, synthetic DNA oligo library and histone post-translational modifications (PTMs), respectively. MiRNAs, which are involved in various biological processes by regulating multiple genes, have been an area of research drawing intensive interest in the recent two decades. There exists an enormous amount of miRNA related information, and how to effectively mine the valuable information embedded in the large volume of literature has become an urgent problem. Because each of the existing online databases includes only partial information about human miRNAs, I created a comprehensive web-based resource ‘miRFocus’ for conveniently retrieving extensive and comprehensive human miRNA information and conducting pathway and Gene Ontology (GO) term enrichment analysis. Current next-generation sequencing (NGS) technologies mainly focus on genome or transcriptome sequencing analysis and none of the existing NGS methods is suitable for high resolution nucleobase-specific analysis of libraries of synthetic oligonucleotides, which are used as materials for engineering long DNA fragments in synthetic biology applications. To meet such requirements, I developed an algorithm and software tool for analyzing synthetic oligo libraries. This approach is composed of two-step quality control and Bowtie2-based sequence alignments. It is proved that such a method successfully assessed the efficiency of etMICC-based error-removal method on synthetic oligos of different lengths and identified that etMICC columns has higher binding affinity with gap error structure than substitution error structure. Epiproteomics examines diverse PTMs, such as histone methylation. However, traditional methods of studying histone PTMs are expensive in cost, labor and time. I developed a histone peptide array (hPepArray) for analyzing activities of cellular histone methyltransferases (HMTs). Lysine-containing peptides of hPepArray are directly generated from 10 histone proteins. In the hPepArray, two known methylation sites H3K122 and H4K59 are verified and one possible methylation site H2A-K74 is identified. The experimental results demonstrate that hPepArray and the method of analysis offer a high-throughput epiproteomic tool to assay activities of HMTs in nuclear lysates.