The theoretical basis of universal identification systems for bacteria and viruses


It is shown that the presence/absence pattern of 1000 random oligomers of length 12� in a bacterial genome is sufficiently characteristic to readily and unambiguously distinguish any known bacterial genome from any other. Even genomes of extremely closely-related organisms, such as strains of the same species, can be thus distinguished. One evident way to implement this approach in a practical assay is with hybridization arrays. It is envisioned that a single universal array can be readily designed that would allow identification of any bacterium that appears in a database of known patterns. We performed in silico experiments to test this idea. Calculations utilizing 105 publicly-available completely-sequenced microbial genomes allowed us to determine appropriate values of the test oligonucleotide length, n, and the number of probe sequences. Randomly chosen n-mers with a constant G + C content were used to form an in silico array and verify (a) how many n-mers from each genome would hybridize on this chip, and (b) how different the fingerprints of different genomes would be. With the appropriate choice of random oligomer length, the same approach can also be used to identify viral or eukaryotic genomes.



microbe identification, oligonucleotide microarray fingerprinting, species identification


Copyright 2005 Journal of Biological Phyisics and Chemistry. This is a post-print version of a published paper that is available at: Recommended citation: Chumakov, S., C. Belapurkar, C. Putonti, T-B. Li, B. M. Pettitt, G. E. Fox, R. C. Willson, and Yu Fofanov. "The theoretical basis of universal identification systems for bacteria and viruses." Journal of biological physics and chemistry: JBPC 5, no. 4 (2005): 121. This item has been deposited in accordance with the publisher copyright and licensing terms and with the author's permission.