Novel Alignment Based Clustering Algorithms for Pan Genome Analysis of Bacteria Species

dc.contributor.advisorPavlidis, Ioannis T.
dc.contributor.committeeMemberDeng, Zhigang
dc.contributor.committeeMemberFofanov, Yuriy
dc.contributor.committeeMemberShah, Shishir Kirit
dc.contributor.committeeMemberTsekos, Nikolaos V.
dc.creatorAlbayrak, Levent 1981-
dc.date.accessioned2018-12-03T21:35:42Z
dc.date.available2018-12-03T21:35:42Z
dc.date.createdAugust 2016
dc.date.issued2016-08
dc.date.submittedAugust 2016
dc.date.updated2018-12-03T21:35:43Z
dc.description.abstractUnderstanding the basic rules of bacterial evolution and adaptation is critical in developing new anti-bacterial drugs, the use of bacteria in biotechnology applications as well as in combating undesired consequences of bacterial presence in industrial and environmental settings such as corrosion, product spoilage, and degradation. Accumulation of single nucleotide mutations beneficial (or neutral) for bacterial survival is a well-studied mechanism of bacterial adaptation which also reflects the time of species separation from a common ancestor (molecular clock hypothesis). The gene loss or gain due to horizontal gene transfer is another much more dynamic mechanism of bacterial adaptation. Using these mechanisms, bacteria can acquire new features such as virulence factors, locomotion ability (flagella), and heat or drug resistance. A major functional characteristic of bacterial species is the presence of particular gene sets common to the species (core genome) together with genes that are available to individual or groups of genomes (pan genome). The technical difficulties however, lie in how one can identify the same genes or gene families in evolutionarily distant organisms: 1. Identification of a sequence-similarity threshold 2. Computational complexity of sequence clustering algorithms 3. Creation of a biologically meaningful cluster topology In this work, we have developed methods to improve the quality and performance of gene clustering including heuristics free, novel sequence alignment algorithms able to cluster a large number of sequences significantly faster than traditional methods (a few days compared to months of computation) that permit the identification of appropriate similarity thresholds and formation of biologically meaningful cluster topology. The developed algorithms were used to build a “functional similarity” tree of the species reflecting gene composition similarity. The performed analysis also identified co-appearance and avoidance patterns of genes in bacterial species. We have applied the proposed methods to 22 genomes from Bartonella spp. using 34,060 genes.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.citationPortions of this document appear in: Kosoy, Michael, Ying Bai, Russell Enscore, Maria Rosales Rizzo, Scott Bender, Vsevolod Popov, Levent Albayrak, Yuriy Fofanov, and Bruno Chomel. "Bartonella melophagi in blood of domestic sheep (Ovis aries) and sheep keds (Melophagus ovinus) from the southwestern US: cultures, genetic characterization, and ecological connections." Veterinary microbiology 190 (2016): 43-49. DOI: 10.1016/j.vetmic.2016.05.009.
dc.identifier.citationPortions of this document appear in: Kosoy, Michael, Ying Bai, Russell Enscore, Maria Rosales Rizzo, Scott Bender, Vsevolod Popov, Levent Albayrak, Yuriy Fofanov, and Bruno Chomel. "Bartonella melophagi in blood of domestic sheep (Ovis aries) and sheep keds (Melophagus ovinus) from the southwestern US: cultures, genetic characterization, and ecological connections." Veterinary microbiology 190 (2016): 43-49. DOI: 10.1016/j.vetmic.2016.05.009.
dc.identifier.urihttp://hdl.handle.net/10657/3624
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. UH Libraries has secured permission to reproduce any and all previously published materials contained in the work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectBioinformatics
dc.subjectSequence alignment
dc.subjectSequence clustering
dc.subjectClustering algorithm
dc.subjectGlobal alignment
dc.subjectGene profiles
dc.subjectFunctional similarity
dc.subjectBacteria
dc.subjectBartonella
dc.titleNovel Alignment Based Clustering Algorithms for Pan Genome Analysis of Bacteria Species
dc.type.dcmiText
dc.type.genreThesis
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ALBAYRAK-DISSERTATION-2016.pdf
Size:
5.13 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.82 KB
Format:
Plain Text
Description: