Computational Approaches to Detect Pathogens in the Presence of Complex Backgrounds



Journal Title

Journal ISSN

Volume Title



Fast and accurate identification of pathogenic microorganisms in complex clinical and environmental samples is essential for the prevention and treatment of infectious diseases. The most sensitive and accurate detection approaches are based on the examination of the nucleic acid composition of the sample in order to identify the presence of pathogens DNA and/or RNA. A large spectrum of nucleic acid-based tests (such as PCR, RT-PCR, and oligonucleotide microarrays) is designed to examine a sample for the presence of pre-defined genomic signatures: short pathogen-specific DNA and/or RNA fragments. Identification of such signatures however, represents significant computational challenges. To be pathogen specific, each signature (or combination of signatures) must be present (conserved) across all strains of the pathogen, and absent in all other organisms including its close neighbors, and must have assay specific biochemical and thermodynamic properties, such as binding energy, melting temperature, and nucleotide composition. All available signature design algorithms rely on heuristics and are known to miss cases when potential signatures are (explicitly or with small number of mismatches) also present in host (human) and/or non-pathogen microorganisms causing false positive outcomes. Even greater challenge for the design of biochemical platform specific genomic signatures (probes and primers) is that each type of instrument uses different biochemical protocols to detect signatures which also have to be included in the consideration during the signatures design process. To address these challenges we have developed novel algorithms and data structures able to bring all possible subsequences located in given pathogen genome into signatures design process. Moreover, the developed algorithms make it possible to consider mismatches (insertions, deletions, and substitutions for all positions and combinations) into the design process. We also have developed the concept of ultra-specific genomic islands: genomic regions in which every subsequence is several mismatches away from the closest subsequence which may appear in a host genome and/or non-pathogenic near-neighbors of targeted pathogen. This concept allows to improve the quality and flexibility (genomic islands can be used to identify thermodynamically acceptable signatures) of the design of biochemical platform specific detection tests. Developed approach was successfully used to design a variety of tests for Category A, B, and C, pathogens including the 2009 H1N1 Influenza outbreak originated in Mexico.



Computational Approach, Ultra-specific Genomic Islands, Pathogens, Host-blind, Signatures, Assays, Nucleic Acid-based, Deoxyribonucleic acid (DNA), RNA, Genome, Detection strategies