Novel Algorithms for the Analysis and Manipulation of Short Genomic Sequences
dc.contributor.advisor | Pavlidis, Ioannis T. | |
dc.contributor.committeeMember | Fofanov, Yuriy | |
dc.contributor.committeeMember | Tsekos, Nikolaos V. | |
dc.contributor.committeeMember | Widger, William R. | |
dc.contributor.committeeMember | Pâris, Jehan-François | |
dc.creator | Dobretsberger, Otto 1985- | |
dc.date.accessioned | 2015-02-11T18:33:50Z | |
dc.date.available | 2015-02-11T18:33:50Z | |
dc.date.created | May 2014 | |
dc.date.issued | 2014-05 | |
dc.date.updated | 2015-02-11T18:33:50Z | |
dc.description.abstract | The storage, manipulation, and transfer of the large amounts of data produced by high-throughput sequencing instruments represent major obstacles to realizing the full potential of this promising technology. To date, significant effort has been devoted to efficiently compressing these cumbersome sequencing data sets, which are produced in two main text formats: FASTQ and FASTA. As an alternative to the current standard of storing all data, we contend that only high quality data need be stored and propose several new file formats to effectively refine and efficiently store such data. The presented file formats are specifically designed to store only high quality sequencing reads in space efficient text and binary formats. Additionally, we address the quality and redundancy issues of genetic reference databases required for a variety of investigations in the field of genomics. Presented modifications of non-alignment based sequence comparison algorithms address this challenge and make it possible to cluster together dozens of millions of genomic sequences (genes): one of the key challenges to reduce redundancy of genomic databases. | |
dc.description.department | Computer Science, Department of | |
dc.format.digitalOrigin | born digital | |
dc.format.mimetype | application/pdf | |
dc.identifier.uri | http://hdl.handle.net/10657/899 | |
dc.language.iso | eng | |
dc.rights | The author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s). | |
dc.subject | Genomics | |
dc.subject | Markov Chain | |
dc.subject | NGS data compression | |
dc.title | Novel Algorithms for the Analysis and Manipulation of Short Genomic Sequences | |
dc.type.dcmi | Text | |
dc.type.genre | Thesis | |
thesis.degree.college | College of Natural Sciences and Mathematics | |
thesis.degree.department | Computer Science, Department of | |
thesis.degree.discipline | Computer Science | |
thesis.degree.grantor | University of Houston | |
thesis.degree.level | Doctoral | |
thesis.degree.name | Doctor of Philosophy |