Effect of Repeatable Regions on Ability to Estimate Copy Number Variation in Human Genome by High Throughput Sequencing
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Genomic differences (mutations) in humans are profoundly influenced by their distinction as either germ line (inherited) or somatic (developed over one’s life span). Such mutations can vary from a single nucleotide insertion, deletion, or substitution in a gene to a complete duplication or deletion of a large amount of genomic material ranging from thousands of nucleotides to an entire chromosome ultimately referred to as Copy Number Variations (CNV). While a large number of genomic variations have no significant influence on the overall quality of life, certain types of variations in a human genome called abnormalities are known to be associated with genetic disorders including cancer, autism, schizophrenia, just to name a few. Recent advancements in DNA sequencing technologies have made it possible to utilize High Throughput Sequencing (HTS) to identify and detect CNVs. The focus of this research is the development of computational methods used to address the challenges of analyzing high throughput DNA sequence data for quality assessment in relatively large genomes (e.g. human genome) to detect copy number variations and including the data representation. An evolutionary programming approach has been developed to use the set of novel algorithms and data structures introduced in this dissertation for the purpose of efficiently and accurately mapping genomic reads to one or more reference genomes. I have developed computational tools that make it possible to identify the undesirable effects of repetitive regions in the human genome with the ability to identify CNVs and propose a novel approach to reduce their influence on genomic analysis.