Show simple item record

dc.contributor.advisorPavlidis, Ioannis T.
dc.creatorSharma, Meenakshi 1984-
dc.date.accessioned2015-02-11T18:34:33Z
dc.date.available2015-02-11T18:34:33Z
dc.date.createdMay 2014
dc.date.issued2014-05
dc.identifier.urihttp://hdl.handle.net/10657/900
dc.description.abstractGenetic variation can occur in the form of single base changes called Single Nucleotide Polymorphisms (SNPs) or large-scale structural alterations called Copy Number Variations (CNVs). Identification and analysis of CNV(s) is critical in understanding its association with evolution, health, and disease. Over the past decade, new advancements in DNA sequencing technologies have fuelled the field of genomics and opened new doors for performing Copy Number Analysis (CNA). To perform CNA, millions of short subsequences or reads produced by High Throughput Sequencing (HTS) platforms are aligned to reference genome sequence(s). The sequence alignment process produces total number of reads that aligned to each location in the genome and is collectively called as reads coverage. The focus of this research is to develop novel algorithms to accurately estimate coverage in the presence of DNA repeats and single nucleotide mutations. The copy number distribution of the reads mapped to the reference sequence would ideally follow a Poisson distribution assuming that the nucleotide sequence of a genome is random and the sequencing reads came from the random locations in the genome. The coverage data, however, exhibits over-dispersion in the extreme ends of the distribution. Repeatable sequences and SNPs contribute to these unexpected high coverage frequencies. This dissertation presents novel algorithms to estimate the average coverage using a model based on Poison distribution. The model was tested on both simulated and real data with different coverage depths and predicts actual model parameters with reasonably good accuracy. The proposed approach improves estimation of average genome coverage which is central to gene-expression, DNA methylation, and metagenomic studies.
dc.format.mimetypeapplication/pdf
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectalgorithms
dc.subjectmethylation
dc.subjectbioinformatics
dc.subject.lcshComputer science
dc.titleNOVEL ALGORITHMS TO ESTIMATE GENOME COVERAGE USING HIGH THROUGHPUT SEQUENCING DATA
dc.date.updated2015-02-11T18:34:33Z
dc.type.genreThesis
thesis.degree.nameDoctor of Philosophy
thesis.degree.levelDoctoral
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.departmentComputer Science, Department of
dc.contributor.committeeMemberFofanov, Yuriy
dc.contributor.committeeMemberChapman, Barbara M.
dc.contributor.committeeMemberTsekos, Nikolaos V.
dc.contributor.committeeMemberWidger, William R.
dc.type.dcmiText
dc.format.digitalOriginborn digital
dc.description.departmentComputer Science, Department of
thesis.degree.collegeCollege of Natural Sciences and Mathematics


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record