"E. coli 0157:H7 and Genetic Engineering"

I-SIS Report
Mae Wan Ho
March 21, 2001

The food-borne pathogen E. coliO157:H7 has been sequenced. Dr. Mae-Wan Ho asks whether genetic
engineering might have contributed towards its emergence.

E. coli 0157:H7 is a food-borne pathogenic strain of bacteria that emerged in the United States in the 1980s, and is now
responsible for some 75 000 cases of infection annually in that country. It has also been responsible for major
outbreaks in Scotland, Japan and elsewhere since.

The first outbreak was associated with infected hamburgers in 1982. The strain responsible, EDL933, isolated from
ground beef in Michigan, has been studied as a reference strain. The complete sequence of its genome has recently
been determined (1,2), and its closest relative turns out to be the laboratory strain K-12 MG1655. E. coli O157 has
acquired shiga toxin genes (from the bacteria Shigella) and plasmids containing virulence factors by horizontal gene

The two strains, O157 and K12 share a common backbone with almost identical gene order. The 4.1 million base pairs
in the genomes can be lined up side by side along their lengths except at one point where the O157 genome is
reversed. Inversions around the starting point of replication are common in bacterial genome evolution.

Scattered roughly evenly within each genome are hundreds of sections of DNA that are unique to one or the other: 1.34
megabases coding for 1,387 genes in the O strain, the O islands; and 0.53 megabases coding for 528 genes in the K
strain, the K islands. Much of the DNA in O and K islands has been acquired by horizontal gene transfer.

There are 106 O and K islands present at the same locations in the backbone. Only a subset of islands is associated
with elements likely to be autonomously mobile. Most islands are horizontal transfers of relatively recent origin from a
donor species with a different intrinsic base composition.

Of the 1 387 acquired genes in O157, 40% (561) can be assigned a function, another 338 genes of unknown function
lie within clusters that are probably remnants of phage (bacterial virus) genomes. About 33% (59/177) of the O islands
contain only genes of unknown function. Many classified proteins are related to proteins from other E. coli strains or
related enterobacteria known to be associated with virulence, and include alternative metabolic capacities, prophages
(integrated genomes of bacterial viruses) and other new functions.

There are 3574 protein-coding regions in the backbone, and the average nucleotide identity between O157 and K12 is
high: 98.5%. Of these regions, 89% are of equal length and 25% encode identical proteins. Some chromosomal regions
are more different (hypervariable) than the average, but they encode a comparable set of proteins at the same relative
chromosomal positions. In the most extreme case (YadC), the proteins from the two strains exhibit only 34% identity.
Four such loci encode known or putative biosynthesis operons of fimbrial proteins used in attachment to host cells.
Another code for a restriction/ modification system that breaks down foreign DNA.

From the extent of genetic differences between the strains, the authors estimate that E.coli O157:H7 and K12 shared a
common ancestor about 4.5 million years ago. This estimate is highly questionable, however, as are all similar estimates.

Such estimates are based on the so-called molecular clock hypothesis, which assumes a steady, neutral (nonadaptive)
random accumulation of genetic difference per unit time. One assumption is one percent per million years. But this is
notoriously unreliable, as we now know that mutational changes vary directly in proportion to the number of DNA
replication cycles. So, organisms with short life-cycles accumulate changes faster than those with long life-cycles. There
are also many fluid genome processes that can rapidly change genomes. These include hypermutation, or mutations
rates that are up to a million times faster than usual, recombination, and horizontal gene transfer. Horizontal gene
transfer is well documented in all bacteria including E. coli, as is clear from the genome sequence data. Recombination
too, appears to be an important mechanism in the evolution of the enterobacteria to which E. coli belongs. And
hypermutation has been identified in several regions in the E. coli chromosome.

Another factor that would give an overestimate of divergence time is artificial genetic engineering. Artificial genetic
engineering involves rampant recombination and transfer of genes across divergent species barriers. Now that
sequence data are becoming widely available, one ought to be asking the serious question as to whether genetic
engineering might have contributed towards the emergence of E. coliO157 some twenty years ago (3).

Perna NT et al. Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 2001:409: 529-33.
Eisen JA Gastrogenomics. Nature 2001: 409, 462-3.
See Ho MW, Traavik T, Olsvik R, Tappeser B, Howard V, von Weizsacker C and McGavin G. Gene Technology and
Gene Ecology of Infectious Diseases. Microbial Ecology in Health and Disease 1998: 10: 33-59; also "Genetic
engineering superviruses" by Mae-Wan Ho, ISIS Report March 2001

** NOTICE: In accordance with Title 17 U.S.C. Section 107, this material is distributed for research and educational
purposes only. **