ALGORITHMS AND DATA STRUCTURES TO DETECT ONCOVIRUSES IN HUMAN CANCER USING NEXT GENERATION SEQUENCING DATA

dc.contributor.advisorFofanov, Yuriy
dc.contributor.committeeMemberWidger, William R.
dc.contributor.committeeMemberTsekos, Nikolaos V.
dc.creatorZhu, Rui 1980-
dc.date.accessioned2014-12-19T13:28:45Z
dc.date.available2014-12-19T13:28:45Z
dc.date.createdDecember 2012
dc.date.issued2012-12
dc.date.updated2014-12-19T13:28:45Z
dc.description.abstractEvidence suggests human cancer can be induced by viruses. One way to test this hypothesis is to look for viral sequences in the human cancer genome. Next Generation Sequencing (NGS) technology sequences the whole human genome in a short period of time. This opens a door for a systematic analysis of the human genome and a thorough search for oncogenic viral sequences in cancer. However, a huge amount of sequencing reads generated by NGS poses a great challenge on the computational part of data analysis in terms of computing speed and memory usage. Data structures such as hash and tree are widely implemented to improve the performance of computing algorithms. Here, I described both data structures that have been developed in our center and compared their performance. Hash out performed tree when mapping the reads to a small reference sequence database. Subsequently, real human cancer data were analyzed by using the hash-based mapper and different oncoviral sequences were found in different cancers.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10657/834
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectNext-generation sequencing
dc.subjectOncovirus
dc.subjectCancer
dc.subjectHash
dc.subjectTree
dc.subjectSequence reads
dc.subject.lcshComputer science
dc.titleALGORITHMS AND DATA STRUCTURES TO DETECT ONCOVIRUSES IN HUMAN CANCER USING NEXT GENERATION SEQUENCING DATA
dc.type.dcmiText
dc.type.genreThesis
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelMasters
thesis.degree.nameMaster of Science

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_RuiZhu_final.pdf
Size:
1.31 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.11 KB
Format:
Plain Text
Description: