Novel Algorithms for the Analysis and Manipulation of Short Genomic Sequences

dc.contributor.advisorPavlidis, Ioannis T.
dc.contributor.committeeMemberFofanov, Yuriy
dc.contributor.committeeMemberTsekos, Nikolaos V.
dc.contributor.committeeMemberWidger, William R.
dc.contributor.committeeMemberPâris, Jehan-François
dc.creatorDobretsberger, Otto 1985-
dc.date.accessioned2015-02-11T18:33:50Z
dc.date.available2015-02-11T18:33:50Z
dc.date.createdMay 2014
dc.date.issued2014-05
dc.date.updated2015-02-11T18:33:50Z
dc.description.abstractThe storage, manipulation, and transfer of the large amounts of data produced by high-throughput sequencing instruments represent major obstacles to realizing the full potential of this promising technology. To date, significant effort has been devoted to efficiently compressing these cumbersome sequencing data sets, which are produced in two main text formats: FASTQ and FASTA. As an alternative to the current standard of storing all data, we contend that only high quality data need be stored and propose several new file formats to effectively refine and efficiently store such data. The presented file formats are specifically designed to store only high quality sequencing reads in space efficient text and binary formats. Additionally, we address the quality and redundancy issues of genetic reference databases required for a variety of investigations in the field of genomics. Presented modifications of non-alignment based sequence comparison algorithms address this challenge and make it possible to cluster together dozens of millions of genomic sequences (genes): one of the key challenges to reduce redundancy of genomic databases.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10657/899
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectGenomics
dc.subjectMarkov Chain
dc.subjectNGS data compression
dc.titleNovel Algorithms for the Analysis and Manipulation of Short Genomic Sequences
dc.type.dcmiText
dc.type.genreThesis
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
DOBRETSBERGER-DISSERTATION-2014.pdf
Size:
1.67 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.85 KB
Format:
Plain Text
Description: