Citrobacter rodentium is a natural mouse pathogen that causes
attaching and effacing (A/E) lesions. It shares a common virulence strategy with
the clinically significant human A/E pathogens enteropathogenic E.
coli (EPEC) and enterohaemorrhagic E. coli (EHEC)
and is widely used to model this route of pathogenesis. We previously reported
the complete genome sequence of C. rodentium ICC168, where we
found that the genome displayed many characteristics of a newly evolved
pathogen. In this study, through PFGE, sequencing of isolates showing variation,
whole genome transcriptome analysis and examination of the mobile genetic
elements, we found that, consistent with our previous hypothesis, the genome of
C. rodentium is unstable as a result of repeat-mediated,
large-scale genome recombination and because of active transposition of mobile
genetic elements such as the prophages. We sequenced an additional C.
rodentium strain, EX-33, to reveal that the reference strain ICC168
is representative of the species and that most of the inactivating mutations
were common to both isolates and likely to have occurred early on in the
evolution of this pathogen. We draw parallels with the evolution of other
bacterial pathogens and conclude that C. rodentium is a
recently evolved pathogen that may have emerged alongside the development of
inbred mice as a model for human disease.
The genomes of enteric bacteria have been shown to be dynamic entities through gene
acquisition and loss. It is clear that these genomes consist of a highly conserved
core inter-dispersed with a continually evolving accessory genome. Genome flux can
have a profound effect on a particular organism, in many instances it is associated
with adaptation to different niches and may eventually come to define different
isolates, pathotypes or even species. Genome flux can occur by Horizontal Gene
Transfer (HGT) through processes such as transformation, bacteriophage mediated
transduction and conjugation. In addition to gene gain, gene loss through deletions,
rearrangements and the accumulation of point mutations are also major inputs to
genome flux and have been linked to host adaptation, for example in
Salmonella Typhi and Yersinia pestis,  whereby
functions important for the previous lifestyle are no longer preserved through
selection and so accumulate random mutations.
The non-motile, Gram-negative enteric bacterium Citrobacter
rodentium is a natural mouse pathogen. It is the causative agent of
transmissible murine colonic hyperplasia, and is responsible for high mortality in
suckling mice –. C. rodentium is a member of a family of
bacterial pathogens that induce intestinal attaching and effacing (A/E) lesions,
which are characterised by intimate bacterial adherence to host intestinal
epithelial cells, effacement of microvilli, and reorganisation of the host actin
cytoskeleton to form pedestal-like extensions of epithelial cells beneath the
adherent bacteria . Gastrointestinal colonisation and formation of A/E
lesions are mediated by a pathogenicity island called the locus of enterocyte
effacement (LEE), which is conserved among A/E bacteria , . As the only known A/E pathogen to
naturally infect mice, C. rodentium is a valuable model organism
for studying colonisation, virulence factors and modes of pathogenesis of the
clinically significant human A/E pathogens enteropathogenic E. coli
(EPEC) and enterohaemorrhagic E. coli (EHEC) , , .
Different C. rodentium isolates from mouse and hamster colony
disease outbreaks in Japan and the USA in the 1960s, 70s and 80s were originally
classified as either atypical mouse-pathogenic E. coli (MPEC) – or atypical
Citrobacter freundii (later reclassified as
Citrobacter genomospecies 9) –. However, subsequent genetic
and biochemical analyses of these independently isolated strains suggested they were
of clonal origin and they were all reclassified as Citrobacter
We previously determined the whole genome sequence of C. rodentium
strain ICC168, a derivative of a strain isolated from a disease outbreak in
Swiss-Webster mice at Yale University School of Medicine, USA, in 1972, originally
designated Citrobacter freundii biotype 4280 (ATCC 51459) , . ICC168,
together with strain DBS100 which originates from the same source , , are the most
widely studied C. rodentium isolates. We showed that the genome of
ICC168 displayed features associated with bacteria that have recently passed through
an evolutionary bottleneck, including a large number of pseudogenes and IS elements
. Here, we
present the detailed investigation of genomic flux in C. rodentium
with a focus on the impact that mobile genetic elements have had on the genome
evolution of C. rodentium and demonstrate that the genome of this
pathogen is unstable. To show that this is a consistent feature of the species, we
determined the genome sequence of an additional C. rodentium
strain, EX-33 (originally classified as MPEC ), which was isolated from a
spontaneous outbreak of disease in a CF-1 mouse colony at the Institute of Medical
Science, University of Tokyo, Japan in 1981 (K. Itoh, personal communication) and
showed differences in levels of colonisation and disease pathology compared to
DBS100 . We
describe the effect the observed genome rearrangements have on the ability of
C. rodentium to infect the murine host and relate these
findings to the evolution of this important model pathogen.
Results and Discussion
C. rodentium ICC168 is representative of the species
To ensure that the genome of ICC168 was representative of the C.
rodentium species we determined the whole genome draft sequence of
the independently isolated strain EX-33 using 454 and Illumina sequence data to
construct a combined de novo assembly (see methods). The genome of EX-33 was found to be remarkably
similar to ICC168. Despite being merely a draft sequence, the genome of EX-33
differed from ICC168 by just 177 single nucleotide polymorphisms (SNPs), only 43
of which were high-quality validated SNPs, and two deletions (details of all the
differences between ICC168 and EX-33 are listed in Tables 1 and S1).
The high conservation also included all of the mobile genetic elements, including
prophages, insertion sequence (IS) elements and genomic islands (GI), all of
which are present and found at exactly the same sites in both genomes, the one
exception being IS102 which is absent from EX-33 but expanded
to 13 copies in ICC168.
The insertions of IS102 elements have disrupted nine single
genes and a fimbrial operon in ICC168, each of which are found intact in EX-33
(Table 1). Conversely,
there are two deletions in EX-33 compared to ICC168 (Table 1). The first is a 932 bp deletion,
which has resulted in the truncation of ROD_15301 encoding a hypothetical
protein. The second is a 4392 bp deletion in EX-33 that has deleted two genes
(ROD_48251 and ROD_48261) and truncated two other genes (ROD_48241 and
ROD_48271) of unknown function. In ICC168, the sequences corresponding to both
of these EX-33 deletions are flanked by 2 bp and 6 bp direct repeats
respectively. Although these repeats are short sequences, the data suggests that
in both cases the deletions were due to site-specific recombination.
The data presented here is consistent with there being a clonal origin for this
species and provides evidence of continued functional gene loss in both of these
C. rodentium strains.
Evidence of large-scale genomic rearrangements
The genomic architecture of C. rodentium contains a large
intra-replichore inversion of approximately 0.5 Mb in the genome of ICC168,
resulting in a switch in GC deviation (Figure 1). We used PCR to show that the same
inversion is also present in EX-33. GC deviation switches are usually only seen
at the origin and terminus of replication in bacteria  as can be seen in the GC
deviation plots for the genomes of both E. coli K-12 and
Salmonella Typhimurium LT2 ,  (Figure 1). In addition, it was evident from
whole genome comparisons that whilst C. rodentium ICC168 shares
significant conservation in genome synteny with E. coli and
Salmonella, there are many chromosomal inversions and
rearrangements in the genome including two large inversions spanning the origin
and terminus of replication, the latter being identical to an inversion found in
S. Typhimurium LT2 (Figure 1). Inversions over the terminus are
the most common form of large genomic rearrangement detected in enteric
bacteria, and homologous recombination between rRNA operons resulting in such
rearrangements have previously been observed in host-specific
Salmonella species . However, unlike in
Salmonella, recombination between rRNAs did not explain the
genome rearrangements found in C. rodentium, which were largely
flanked by IS elements (Figure
The IS elements found in ICC168 belong to a diverse range of IS families.
However, only 8 types of IS element comprise 66% of the 113 insertions,
indicating extensive IS expansion, particularly for ISCro1
Interestingly, IS elements or IS element-related inverted repeats also flank 6
of the 17 GIs identified in the ICC168 genome  and over half of the IS
elements in the ICC168 genome are located on other mobile genetic elements
rather than on the chromosomal backbone (Table 2). This highly biased distribution of
IS elements is similar to that observed in EHEC O157 genomes . These data,
taken together with previous findings , , suggest that IS
elements are associated with chromosomal rearrangements, and horizontal gene
transfer facilitating the incorporation of novel gene functions into the
C. rodentium genome.
Evidence for ongoing genome instability in C.
To investigate if the observed genomic architecture of C.
rodentium was stable, we analysed the PFGE profiles for C.
rodentium strain ICC168 and derivatives of that strain representing
the majority of C. rodentium isolates used in our laboratories
(Table 3). This
analysis revealed that ICC168 exhibited the same PFGE pattern as ICC169,
ICC169-474, ICC169-335 and ICC169-476. However, isolates ICC169-407 and
ICC169-496 displayed significant differences in their PFGE profiles, compared to
ICC168 (Figure 2).
With the aim of pinpointing the rearrangement ‘break points’ we
sequenced the genomes of two of the recombinant C. rodentium
isolates, ICC169-407 and ICC169-496. By mapping 454 paired end sequences for
these two isolates to the genome of the reference, wild-type strain ICC168, we
identified positions where the 454 sequence pairs mapped to sequences either in
the wrong orientation with respect to each other, or mapped to distant sites on
the reference genome. These data highlighted four rearrangements within the
sequenced genomes of ICC169-407 and ICC169-496 (Figures 3B–E). Three of these were
large independent genomic inversions mediated by homologous recombination
between two copies of different identical repeat sequences: i)
3B); ii) genes encoding Elongation factor Tu (Figure 3C); and iii) the T3SS effector NleD
and the adjacent transposase (Figure 3D). It is of particular note that the
ISCro4-mediated 0.59 Mb inversion (Figure 3B) largely corrects the switch in GC
deviation caused by the inversion identified in the ICC168 genome sequence (see
above and Figure 3A). The
fourth rearrangement in ICC169-407 and ICC169-496 can be explained by a double
cross-over recombination event between two almost identical rearrangement hot
spot (rhs) elements which could result in the translocation
shown in Figure 3E. This
provides biological evidence that rhs can diversify through
intra-specific recombination, as previously speculated .
To confirm the nature of the rearrangements in ICC169-407 and ICC169-496 we used
PCR to amplify sequences spanning the recombination points. Using different
combinations of the primers NKP135-NKP150 (Table S2),
we confirmed that all of the identified genome configurations (shown in Figures 3A–E) were
present in genomic DNA preparations from a single culture of each isolate. These
data indicate that the genome rearrangements observed are reversible and
actively ‘flipping’ between one genome orientation and another,
which suggests that the genome of C. rodentium is in a constant
state of flux.
Evidence for functional gene loss through disruption by mobile genetic
elements and pseudogenisation
Our analyses showed that collectively, IS elements and prophages were responsible
for 22% of the pseudogenes in the C. rodentium ICC168
genome, with the insertion of five out of the ten prophages disrupting accessory
and core genes . Additional analysis of the mobile genetic element
insertions, indicated that several of them may have resulted in phenotypic
alterations crucial to the evolution of C. rodentium which led
us to investigate further. Of particular note are the prophages CRP28 and
CRPr20, the insertions of which have disrupted gene clusters for the
biosynthesis of two distinct flagellar systems in C. rodentium.
CRPr20 has inserted into one of the gene clusters that encode the conventional
Flag-1 flagellar system found in most members of the enterobacteriaceae whereas
CRP28 has inserted into the Flag-2 ancestral flagellar cluster, which is found
in some E. coli strains but is absent from
Salmonella and most other enteric bacteria , .
The insertion by CRP28 has resulted in the deletion of the Flag-2
lfiH gene, encoding a putative flagellar assembly protein,
and truncation of the two flanking genes, lfiG and
lfiI encoding predicted flagellar switch (C-ring), and
flagellar export and assembly proteins respectively . The remainder of the Flag-2
cluster genes remain largely intact, although lfgF (encoding a
flagellar rod protein) and another gene within this cluster carry point
mutations generating premature stop codons.
CRPr20 has inserted into the 3′ end of fliC, deleting the
last six codons of the flagellin gene of Flag-1. Our analysis indicated that the
truncated fliC may encode a protein with an altered C-terminus
which could mean that flagellin was still synthesised and secreted, but not
polymerised. However, as in Flag-2, in addition to the prophage insertion there
are other significant disruptions in the Flag-1 flagellar biogenesis genes in
the form of a deletion event which has removed flgD,E,F,G,H and
I, and truncated flgC and
flgJ, genes required for rod and hook formation and
There is also an IS element insertion in the gene encoding the flagellar
assembly regulator and chaperone, FlgN . The remaining Flag-1
genes appear intact.
In transmission electron microscopy (TEM) studies of C.
rodentium ICC169 we saw no evidence of basal bodies, and there was
no evidence of flagella in culture supernatants or lysates (data not shown).
This is consistent with previous reports showing that there was no detectable
flagellin in C. rodentium DBS100 and that the organism is
Flagella are targeted by the TLR5 receptor of the innate immune system and the
lack of flagella in C. rodentium could facilitate escape or
modulation of any inflammatory response following infection. Most other bacteria
belonging to the family Enterobacteriacae express functional flagella. Notable
exceptions are Shigella, Salmonella Gallinarum
and Salmonella Pullorum , which are all host adapted.
Flagella may play a key role in environmental survival and the ability to
survive without motility can be considered further evidence of host
Prophages CRP38, CRP99 and CRP49 were also found to have inserted within genes.
CRP38 has inserted into a gene of unknown function, whilst the insertion of
CRP99 has disrupted ROD_08971 that is predicted to encode a putative large
repetitive protein showing significant homology to a Type I secreted large
repetitive protein in S. Typhi, and a putative
haemagglutinin/haemolysin-related protein in Ralstonia
solanacearum. CRP49 has
inserted into gatD (encoding galactitol-1-phosphate
dehydrogenase), which is essential for the metabolism of galactitol by
E. coli, and
found within the otherwise intact galactitol utilisation operon. Consistent with
this, we found that C. rodentium is unable to grow in minimal
medium with 0.5% galactitol as its sole carbon source (data not
Since the same patterns of prophage-mediated insertional inactivation are seen in
EX-33 it is clear that prophages either contribute to, or are driving the
degenerative genome evolution of C. rodentium.
Plasmid pCROD1 is lost at high frequency
To investigate the stability of the plasmids, we determined the plasmid profiles
of a range of C. rodentium ICC168 derivatives. C.
rodentium strain ICC168 carries four plasmids, pCROD1-3 and pCRP3
However, our profiling showed that whilst plasmids pCROD2 (39 kb), pCROD3 (4 kb)
and pCRP3 (3 kb) were present in all isolates, the largest plasmid, pCROD1 (54
kb), was present in only five out of the nine isolates tested (Table 3). The plasmid
profiling gels also showed that the relative intensities of the bands
corresponding to each plasmid were equivalent across the isolates, with the
exception of the 54 kb pCROD1 band for ICC168-474 where the intensity was
greatly reduced (Figure S1), suggesting that pCROD1 is either present in a lower copy
number in ICC169-474 or, considered more likely, has been lost from a proportion
of the cells in that population. This was confirmed by PCR data, which showed
that the large plasmid is only present in 50% of the population of
ICC169-474 (two out of four gDNA extractions from cultures originating from
individual colonies of the same generation), indicating that pCROD1 is lost at
high frequency. This is perhaps surprising given that all four plasmids are
retained by EX-33.
Plasmid pCROD1 is predicted to encode several potential virulence factors,
including three putative autotransporters and a fimbrial operon , therefore
its high frequency loss is a further indicator of ongoing genome evolution and
adaptation to a new environment.
Transcriptomic data reveals evidence for recent niche adaptation and prophage
To further investigate the impact of the C. rodentium mobile
genetic elements, we performed a whole genome transcriptome analysis on
C. rodentium strain ICC169-476, by RNA-seq using Illumina
sequencing technology, to determine if the prophages or the pseudogenes found
within the genome were expressed. This data revealed that 152 of the 182
pseudogenes in the ICC168 chromosome were transcriptionally active (Table S3;
73 have RPKM (Reads mapped Per Kilobase per Million reads) expression values of
1–9, 57 have values of 10 to 99, 12 have values >100). The expressed
genes include fliC (RPKM value of 9), however, the fact that
flagellin has not been detected for C. rodentium suggests that
the transcript is not translated. The continued transcription of the majority of
the C. rodentium pseudogenes, together with the low number of
multiple mutations in them, was taken to suggest that the disruption of these
genes were relatively recent events.
Surprisingly, our transcriptome data also revealed that the majority of the genes
encoded on each of the five intact prophages (CRP28 55 out of 58 predicted CDSs,
CRP99 52/55, ΦNP 65/65, CRP38 36/44, and CRP49 54/56) and the prophage
remnant CRPr20 (21/29) were expressed under standard growth conditions (Figure 4, Table S3).
This is unusual since for most prophages the structural and lysis genes are
repressed in the lysogen , . In addition to structural, lysis and regulatory
genes, several of the other prophage-encoded genes showed relatively high levels
of expression (Figure 4,
S3). Bioinformatic analyses showed that in most cases these genes
were in regions of aberrant GC content or they were encoded in regions that
corresponded to known ‘cargo holds’ for non-essential genes by
comparison with the genomes of well-characterised phages, such as Mu and P2. In
most cases, the function of the genes in these transcriptionally highly active
regions is unknown (Table
4), although putative damage-inducible and host-toxic membrane
proteins encoded on ΦNP, CRPr20 and CRP38 were previously described, based
on similarity to DinI and HokA respectively . Some genes could be
assigned putative products based on conserved protein domains such as
transmembrane regions and signal peptides (Table 4). It is of note that the recently
described effector NleK , encoded on CRP99 (ROD_09131), was not highly
transcribed under the conditions tested. These data suggest that most of the
C. rodentium prophages carry genes that represent known or
novel lysogenic conversion functions.
C. rodentium prophages spontaneously excise and transpose
within the genome
To determine if any of the C. rodentium prophages were capable
of spontaneous excision, primers were designed to sequences at the ends of the
integrated prophage genomes, facing outwards towards the prophage attachment
sites (attL and attR). Using these primers
S2), DNA would only be amplified by PCR if the prophage excised from
the host genome and circularised, bringing the primer pairs into the correct
orientation with respect to each other.
PCR products were obtained for all the intact prophages CRP28, CRP99, ΦNP,
CRP38, and CRP49, but not for the partial prophage CRPr20 (data not shown). Only
a single sized PCR product was obtained for each of ΦNP and CRP38, which,
on sequencing, showed that these phages precisely excised from the host genome.
However, multiple PCR products of different sizes were amplified for CRP28,
CRP99 and CRP49. These three prophages are all Mu-like  and, characteristically,
carry phage transposition proteins, which can facilitate random transposition in
the same way as for Mu and other transposable elements , . The amplified PCR
products obtained for each of CRP28, CRP99 and CRP49 were sequenced, revealing
the terminal prophage sequences as well as a range of different intervening host
genomic sequences. This is evidence of illegitimate excision, indicating that
these phages are capable of random transposition and, if packaged, could be
capable of specialised transduction.
We cloned and sequenced 235 of the intervening host genomic inserts for CRP99.
The sequences for 133 of the inserts mapped to different chromosomal locations
in the ICC168 genome sequence, 70 sequences mapped to regions in plasmid pCROD3
and 32 sequences mapped to plasmid pCRP3 (Figure S2),
confirming that CRP99 was randomly transposing around the bacterial genome and
taking adjacent bacterial DNA with it on excision from each genomic location.
pCROD1 had been lost from the strain used as a template for this PCR, strain
ICC169-476 (Table 3),
however no sequences mapped to the 39 kb plasmid pCROD2, which was present in
this strain (see above). The reason for this is not clear considering the depth
at which we sampled independent insertions.
The size of the host chromosomal DNA inserts incorporated into the excised and
circularised CRP99 genome varied from 16 bp to 3334 bp. This is comparable with
the genome of phage Mu, which is found flanked by variable sequences of up to
150 bp of host DNA at the left hand end and up to 3 kb at the right hand end
when packaged . Significantly, of the plasmid derived sequences
incorporated into CRP99, we were able to show that for 22 inserts in pCROD3 and
24 inserts in pCRP3 the whole plasmid had been incorporated into the
circularised phage genome (3910 bp and 3172 bp respectively).
To our knowledge, this is the first description of entire plasmids being
incorporated into a phage genome and provides intriguing evidence for the
possibility of plasmid dissemination between bacteria via specialised
transduction. Since neither of these plasmids have recognisable mobility markers
of their own this may explain how they entered C. rodentium. It
may also explain why plasmids similar in size to pCROD3 and pCRP3 are so
successful and found in a wide range of different bacteria.
Analysis of the paired sequencing reads of EX-33 confirmed that ΦNP is
spontaneously excising and circularising in both C. rodentium
strains and showed evidence that the three Mu-like phages are also randomly
transposing in the EX-33 genome.
ΦNP produces virions capable of infecting and lysogenising E.
Considering the transcriptional activity of the C. rodentium
prophage structural genes, TEM was used to examine culture supernatant to
identify if functional virions were formed. Even in uninduced overnight cultures
of strain C. rodentium ICC169-476, virions with an icosahedral
head 70 nm in diameter, a 10 nm long neck and a contractile tail 115 nm long
were visible. Although the majority of the virion tails observed were
contracted, some with extended tails showed evidence of a base plate and tail
fibres. This was the only virion morphology we observed, even with concentrated
To determine if this phage had an extended host range, a variety of different
bacteria available in our laboratories, including Pseudomonas
sp., C. freundii, Serratia sp., Pectobacterium
sp., Yersinia enterocolitica, Salmonella sv. and a range of
pathogenic and non-pathogenic E. coli strains, were tested for
susceptibility to infection using C. rodentium overnight
culture supernatant. Plaques were observed on strains of E.
coli K-12 and its derivatives, but no signs of infection were seen
for any other bacterium tested.
The supernatant of an uninduced overnight culture of C.
rodentium ICC169-476 produced between 105 and
107 pfu/ml when titrated on E. coli K-12 strains
including MG1655. The plaques were turbid suggesting that the phage(s) present
in the C. rodentium supernatant were able to lysogenise
E. coli K12. This was confirmed using methods previously
Phage isolated from these plaques were propagated on E. coli
K12 strain ER2507 to make high titre phage lysates and we cloned and sequenced
DNA extracted from the phage virions in these lysates. The sequences obtained
mapped to prophage ΦNP in the C. rodentium genome.
Furthermore, random primed PCR performed on genomic DNA from E.
coli ΦNP lysogen ER2507 NPL showed that ΦNP had
inserted into an identical genomic location in E. coli ER2507
to the insertion site identified in C. rodentium, the
ssrA tmRNA gene (data not shown, ).
Host range studies using purified ΦNP lysates (propagated on E.
coli) produced an identical infection pattern to that seen for the
C. rodentium supernatant, and electron microscopy showed
that ΦNP virions were identical in size and morphology to the virions
observed in the supernatant. This morphology (shown in Figure 5) allowed classification of ΦNP
into the order Caudovirales and family
may indicate that the only virions observed in the supernatant were those of
ΦNP. Nevertheless, the possibility of the other C.
rodentium prophages forming functional virions cannot be ruled out,
as there could be functional virions other than ΦNP spontaneously formed
and present in the supernatant of C. rodentium, for which a
susceptible host has yet to be found.
Genome flux in C. rodentium is a natural phenomenon
To determine if the genomic rearrangements observed in vitro
were a natural phenomenon and had an impact on the ability of C.
rodentium to infect its murine host, four isolates were tested to
determine their virulence phenotypes. The isolates selected for murine infection
were ICC169-407 and ICC169-496, which had both shown several band differences in
PFGE profiles, along with two isolates which both had the same PFGE profile as
ICC168: ICC169-476 which is missing the plasmid pCROD1; and the wild-type
NalR strain ICC169.
Each of the different isolates of C. rodentium were able to
colonise the gastrointestinal tract of the mouse, and all four groups of five
mice showed a normal pattern of infection, as previously described . Bacterial
shedding in the faeces of individual mice was monitored over the course of the
infection. We found no significant difference in the numbers of bacteria being
excreted between the different groups of mice (Figure 6). The group infected by ICC169-407
had all cleared the infecting bacteria by day 15 post infection, however only
mice infected with ICC169-496 had all mice in the group still shedding bacteria
at this time point. Shedding of C. rodentium had ceased in all
groups by day 17 post infection (Figure 6). On examination of the colons we found that there were no
obvious differences in the extent of hyperplasia induced by the different
isolates, with the average crypt lengths measuring from 211 µm (SD
= 38 µm) to 240 µm (SD
= 70 µm). These data indicate that neither the
chromosomal rearrangements observed in ICC169-407 and ICC169-496, nor the
natural loss of plasmid pCROD1, have any discernable effect on bacterial
shedding or virulence in C. rodentium.
Genome rearrangements occur in vivo
The effect of mouse passage on the genome architecture of C.
rodentium was also determined. We performed PFGE analysis on
selected ICC169 isolates obtained from mouse faeces at different times during
infection and also from the colon and caecum, the organs colonised by this
pathogen. These isolates are described in Table 3. PFGE showed that all the post-mouse
passage isolates tested, displayed an identical banding pattern to each other
and to the original strain ICC168 (as shown in Figure 2). This was even true for isolates
taken from mice infected with the isolates ICC169-496 and ICC169-407, which
display different PFGE patterns to ICC168 (Figures 7A and B, Figure 2). This may indicate that the
different genome rearrangements in these isolates reverted to the original
genomic configuration, that of the wild-type strain ICC168, on passage through
the mouse. Alternatively, it is possible that a minor subpopulation with the
wild-type genome conformation, that we have shown to exist within cultures of
ICC169-496 and ICC169-407, was selected for within the mouse.
For all the strains tested in mice, the genomic rearrangements identified appear
to be entirely neutral with regards to virulence and the progression of the
infection. However, the fact that the alternative PFGE profiles seen for
ICC169-407 and ICC169-496 revert to the original, ICC168-like, profile
in vivo, indicates that this original genomic orientation
may provide a fitness advantage in the murine host. The only observed phenotypic
effect of genome rearrangement was for two ICC180 isolates (Table 3). ICC180-P10,
isolated from mouse faeces after ten successive passages through mice by natural
transmission from infected to naive animals through the faecal-oral route, had a
different PFGE profile compared to the wild-type, pre-passage isolate ICC180
Interestingly, we found that the virulent phage ΦCR1, known to target
lipopolysaccharide (LPS) as a receptor , was unable to infect the
post-mouse passage isolate ICC180-P10, although it was able to form clear
plaques on the wild-type ICC180 and all the other C. rodentium
strains used in this study (data not shown). This data suggests that the genome
rearrangement observed in ICC180-P10 may have affected LPS biosynthesis.
Our data indicate that genome instability is a feature of C. rodentium in
vivo, as well as in vitro and that the genomic
rearrangements observed are indicative of natural variation within a population.
It is plausible that an invertible genome region may result in a differential
expression of genes, which could allow rapid adaptation to different
environments or stresses. This has been seen previously in Campylobacter
jejuni where large-scale intra-chromosomal inversions, which were
reversible, were associated with escape from infection by endogenous virulent
phages on passage through the avian gut . In addition, genomic
rearrangements have previously been identified in strains of
Pseudomonas aeruginosa and
E. coli– during the course of
human infection and appear to be linked to niche adaptation. Thus, in different
environments, for example in vivo and in
vitro, the dominant populations of C. rodentium could
show different genomic arrangements, as demonstrated by the singular genomic
conformation of post-mouse infection C. rodentium strains,
despite the different genome arrangements of the infecting strains. However,
further work is needed to understand the full impact of each genomic
rearrangement on gene expression in C. rodentium, and to
determine if this is a widespread phenomenon in other bacteria.
We have shown that the genome of C. rodentium is in a state of
considerable gene flux through large-scale, repeat-mediated recombination both
in vitro and in the murine host, and also through the
expansion of IS elements and the presence of several actively transposing
prophages which are able to insert, apparently at random, throughout the
chromosome and plasmids. Gene flux has also resulted in significant functional
gene loss, particularly due to prophage and IS element insertions, which were
fixed and invariant in all of the C. rodentium ICC168
derivatives we sequenced. The fact that almost identical patterns of gene loss
can be seen in two lineages of C. rodentium, independently
isolated from diseased mice on two different continents, a decade apart, is
consistent with this occurring at the root of the evolution of this species and
is likely to have played a significant role in the evolution of C.
rodentium as an A/E pathogen of mice. One such example is the loss
of flagella production through disruption of both flagella biogenesis systems,
as flagella are known to be important elicitors of the innate immune system.
However, in addition to the loss of functions associated with virulence,
C. rodentium has also lost metabolic capacity, for example
we have shown that the operon encoding galactitol utilisation in ICC168 and
EX-33 has been disrupted by prophage insertion, thereby limiting the available
number of carbon sources that can be used by this bacterium. There are several
examples now of other bacteria where loss of metabolic flexibility is associated
with having recently changed niche –.
It is clear from mouse infection studies that chromosomal rearrangements are a
natural phenomenon and that, as might be expected, ongoing genome flux is
largely neutral, not having had time for selection to play a role, and so having
no discernable effect on bacterial shedding or virulence in the murine host.
This included the loss of the large plasmid pCROD1, which, despite encoding two
toxin-antitoxin addiction systems , our data shows is lost at
We previously showed that many of the functions that confer C.
rodentium with a common virulence strategy to EPEC and EHEC are
located on horizontally acquired mobile genetic elements . This, together with the
large-scale genomic rearrangements and functional gene loss described in this
study, suggests that C. rodentium has only recently emerged as
a significant pathogen and is still adapting to its new lifestyle. Furthermore,
the fact that C. rodentium is not known to cause disease in
wild mice, only in laboratory rodents, may indicate that this pathogen emerged
with the development of the mouse as a model organism and the large-scale
captive breeding of small rodents. This would certainly explain the clonal
nature of this species.
Materials and Methods
This study was performed under project licence number 80/2099 approved by the UK
Home Office and carried out in strict accordance with the UK Animals (Scientific
Procedures) Act 1986. The Wellcome Trust Sanger Institute's Ethical Review
Committee approved the research protocols used in this study.
Bacterial strains and culture conditions
C. rodentium and E. coli strains were grown at
37°C in Luria–Bertani (LB) medium. For solid medium 1.5% agar
was added, and for soft medium overlay (top agar) 0.15% agarose was used.
When required, nalidixic acid (Nal) was added to LB to a final concentration of
50 µg/ml for selection. Phage buffer was composed of 10 mM Tris/HCl pH
7.4, 10 mM MgSO4, and 0.01% gelatin. The bacterial strains described in
this study are listed in Table
3. The C. rodentium isolates ICC169-335, ICC169-407,
ICC169-474, ICC169-476 and ICC169-496 all came from the same original stock of
ICC169, but show different PFGE profiles and/or plasmid content.
EX-33 genome sequencing and comparative analysis
The whole genome of C. rodentium strain EX-33 was sequenced on
the 454/Roche GS FLX analyzer, with long-read GS FLX Titanium chemistry from a 3
kb insert paired end library prepared according to the manufacturer's
specifications. A de novo assembly was produced from the
generated sequence data using the 454/Roche Newbler assembly program (Software
Release 2.1), which produced 27 scaffolds with an N50 scaffold size of 390,676
bp (largest scaffold size 699,126 bp) and 867 contigs with an N50 contig size of
9,811 bp (largest contig size 41,345 bp). The assembly consisted of 249,640
sequence reads (including 91,558 paired reads) totalling 43,628,532 bp,
constituting a theoretical 8-fold coverage.
EX-33 was also sequenced on the Illumina GA II analyzer. A standard Illumina
library was made with a 200 bp insert size and sequenced to a 54 bp read length
using standard protocols , and a de novo assembly was produced
using the Velvet assembly program. The optimal assembly was produced from a kmer
length of 31. It generated 1,761 contigs with an N50 contig size of 4,177 bp
(largest contig size 33,600 bp) from 8,751,150 sequence reads, constituting a
theoretical 88-fold coverage.
The sequence data from the two sequencing platforms (individual 454 reads and
consensus reads from the shredded Illumina assembly) were combined and assembled
using the 454/Roche Newbler assembly program (Software Release: 2.3) into a
consensus sequence of 382 total contigs (294 large contigs; N50 contig size,
38,722 bp) from 272,234 sequence reads totalling 54,256,007 bp, constituting a
theoretical 10-fold coverage. Contigs were scaffolded using paired reads with an
average pair distance of 2,998 bp into 40 scaffolds (N50 scaffold size, 244,370
bp) totalling 5,318,492 bp.
The EX-33 scaffolded contigs from the combined 454-Illumina assembly were ordered
according to the ICC168 genome sequence (accession numbers FN543502
(chromosome), FN543503 (pCROD1), FN543504 (pCROD2), FN543505 (pCROD3) and
AF311902 (pCRP3)) using ABACAS , and the annotation transferred from the reference
Insertions/deletions in the EX-33 genome were identified by pairwise whole genome
comparison of the ordered scaffolded contigs with the ICC168 genome sequence
using BLASTN and visualised using the Artemis Comparison Tool . Deletions
from the EX-33 genome with respect to the ICC168 genome were confirmed by
contiguated sequence spanning the syntenic regions in EX-33 and sequencing reads
spanning each insertion/deletion region in the mapped coverage plot generated
using SSAHA .
For SNP detection, the EX-33 454-Illumina combined assembly consensus sequence
was shredded, resulting fragments were mapped by SSAHA and SNPs called with
respect to the reference ICC168 genome and validated according to previously
described protocols . In addition, SNPs that were not located in
repetitive sequences were validated manually, and only SNPs found in at least 5
sequencing reads, mapping to both strands, and present in at least 75% of
the reads were passed as high-quality SNPs.
To identify gene flux and genomic rearrangements in the EX-33 genome, the paired
Illumina sequencing reads were mapped to the ICC168 reference sequence using Maq
(http://sourceforge.net/projects/maq/) and mismapping read pairs
were identified using BamView .
DNA embedded in plugs was prepared using the CHEF Genomic DNA Plug Kit (Bio-Rad
Laboratories, Hercules, CA, USA) from bacterial cells in suspension buffer
(Bio-Rad Laboratories), grown to an optical density at an absorbance of 610 nm
(OD610) of 1.3–1.4. Restriction digestion was performed
with 30 U of XbaI (New England BioLabs) at 37°C
overnight. Plugs were soaked in 0.5× TBE for 15 min at 4°C prior to
electrophoresis. DNA fragments were resolved in 1% SeaKem Gold agarose
(FMC Bioproducts, Rockland, ME, USA) in 0.5 x TBE buffer at 10°C, using a
CHEF DR-III system (Bio-Rad Laboratories), running at a linear ramping factor of
2–68 s, pulse angle at 120°. The run length was 25 h at a
constant voltage of 6 V/cm. DNA restriction patterns were assessed visually
following ethidium bromide staining.
Identification of recombination break points in C. rodentium
The whole genomes of C. rodentium strains ICC169-407 and
ICC169-496 were sequenced by paired-end 454 FLX pyrosequencing and assembled
using the 454/Roche Newbler assembly program. For ICC169-407, contigs (1700
total contigs, 1355 large contigs; N50 contig size, 5,633 bp) were assembled
from 290,987 sequence reads with an average read length of 168 bp, constituting
a theoretical 9-fold coverage, contigs were scaffolded using paired reads with
an average pair distance of 3,715 bp into 68 scaffolds (N50 scaffold size,
378,576 bp). For ICC169-496, contigs (4600 total contigs, 3081 large contigs;
N50 contig size, 1,332 bp) were assembled from 243,094 sequence reads with an
average read length of 162 bp, constituting a theoretical 7-fold coverage,
contigs were scaffolded using paired reads with an average pair distance of
2,595 bp into 153 scaffolds (N50 scaffold size, 45,450 bp).
Scaffolded contigs were aligned with the ICC168 genome sequence using ABACAS.
Read pairs with an insert size of at least 2 kb were mapped to the scaffold
contigs using SSAHA and only read pairs that mapped uniquely and with maximum
quality were selected. Recombination break points were found by BLASTN of the
scaffold contigs against the genome of ICC168 and identifying single scaffold
contigs that matched with two disparate regions of the reference genome, and
also had reads spanning the putative point of recombination. The read pairs were
then mapped to the reference genome, using SSAHA, and the break points were
confirmed by a lack of reads spanning the corresponding region in the ICC168
genome sequence. Rearrangement break points were confirmed by PCR using primers
designed to non-repetitive DNA sequences in the genome of ICC168 (Table
Carbon source growth curve
An overnight culture of C. rodentium ICC168 was seeded
1∶50 into 50 ml of minimal media consisting of M9 salts supplemented with
either 1% glucose or 1% galactitol as a carbon source. The
cultures were then incubated at 37°C with agitation at 200 rpm. The
OD600 was measured every 60 min for 7 h using a Helios
spectrometer (Thermo Scientific).
C. rodentium plasmid content and sizes were assessed according
to the method of Kado and Liu , and confirmed in different isolates by PCR of
cultures originating from individual colonies of the same generation using the
primers NKP111-NKP118 (Table S2).
The whole genome transcriptome of ICC169-476 was sequenced using Illumina
sequencing technology as previously described . Removal of genomic DNA
from the RNA sample and subsequent successful cDNA generation was confirmed by
PCR using the four primer pairs NKP125-132 that generate amplicons internal to
C. rodentium housekeeping genes (Table S2).
Expression values were calculated as Reads mapped Per Kilobase per Million reads
(RPKM) and recorded for each predicted CDS in the ICC168 genome (Table
Prophage excision/transposition detection
PCR analysis to detect spontaneous prophage excision and circularisation was
performed using supernatant from an overnight culture of C.
rodentium ICC169-476 or ICC169-407 as a template and primer pairs
designed to sequences at the ends of the integrated prophage genomes, facing
outwards towards the prophage attachment sites for each prophage (CRP28L and
CRP28R, CRP99L and CRP99R, NPout1 and NPout4, CRP38L and CRP38R, CRP49L and
CRP49R). The CRPr20 primers CRP20L and CRP20R were also used as a control.
Primers are listed in Table S2.
To confirm prophage transposition, CRP99 PCR products were end repaired, gel
purified and then cloned into SmaI cut pUC19 vectors. MegaX
DH10B T1R electro-competent cells (Invitrogen) were used for the
transformations, and transformants were selected on Xgal/IPTG
(blue/white screen). Libraries were sequenced using standard forward and reverse
primers. Sequences of at least 300 bp in length (of which approximately 200 bp
mapped to one or other end of prophage CRP99) were mapped to the C.
rodentium ICC168 genome sequence. Circular diagrams showing the
insertion sites were made using DNAplotter .
ΦNP virions were isolated from plaques formed on E. coli
K-12 strain ER2507 after titration with chloroform-treated supernatant from an
overnight culture of C. rodentium ICC169-476. Following plaque
purification and further propagation on E. coli K-12 strain
ER2507, DNA was extracted from high titre ΦNP lysates as previously
For cloning, ΦNP DNA and a pUC19 vector were digested with
BamHI prior to ligation. The ligated vector and insert were
used to transform chemically competent E. coli DH5α cells,
and the transformed cells selected, using a blue/white screen, purified and
sequenced using standard primers.
The integration site of ΦNP in E. coli was determined by
random primed PCR  on the E. coli ΦNP lysogen
ER2507 NPL using the ΦNP specific primers NPL1 and NPR1, and the nested
primers NPL2 and NPR2 respectively (Table S2). The resulting PCR products were
sequenced and mapped to the E. coli K-12 MG1655 genome sequence
(Accession number U00096) .
Transmission electron microscopy (TEM) and host range determination were
performed as described previously .
Female 6–8 weeks old C57BL/6 mice, purchased from Charles River (Margate,
United Kingdom), were used to assess the virulence of different C.
rodentium ICC169 isolates. All mice used in these studies came from
colonies that were specific-pathogen free. Animals were housed in individually
HEPA filtered cages with sterile bedding and free access to sterilised food and
water. C. rodentium inocula were prepared by culturing bacteria
overnight at 37°C in 100 ml of LB containing Nal. Cultures were harvested by
centrifugation and resuspended in a 10% volume of PBS. Groups of five
mice for each strain tested were orally inoculated using a gavage needle with
200 µl of the bacterial suspension. The viable count of the inocula was
determined by retrospective plating on LB agar plates containing Nal. At
selected time points post-infection, faeces were aseptically collected (100 mg
faeces/ml PBS), serially diluted in PBS and plated on LB agar containing Nal.
All plates were incubated overnight at 37°C. When all mice had stopped
shedding bacteria the mice were sacrificed and colons removed. Small pieces of
colonic tissue were fixed in 4% formaldehyde, then paraffin embedded,
sectioned and stained with haematoxylin and eosin, for histological
In addition, mice were orally inoculated with C. rodentium
ICC180 as previously described and at the peak of infection (day 7 post-gavage)
housed with naive mice to allow the natural transmission of ICC180 to occur via
the faecal-oral route . The natural transmission of ICC180 was followed by
aseptic recovery of faecal samples from each animal at various time points after
introduction. Mice infected in this way (termed passage 1, P1) were then housed
with naive animals and the newly infected animals designated passage 2 (P2).
This was continued until ICC180 had undergone ten successive passages from
infected to naive mice, and ICC180-P10 was isolated from the faeces of passage
10 (P10) mice.
The EX-33 genome sequencing reads from both the Illumina and 454 platforms have
been deposited in the Short Read Archive under the accession number ERS005106.
The combined 454-Illumina assembly of the EX-33 contigs can be accessed from the
website of the Wellcome Trust Sanger Institute (http://www.sanger.ac.uk/resources/downloads/bacteria/). The
sequence and annotation of the fimbrial operon unique to EX-33 has been
submitted to the EMBL/GenBank/DDBJ databases with the accession number
The 454 sequencing reads for ICC169-407 and ICC169-496 have been deposited in the
Short Read Archive under the accession numbers ERS004752 and ERS004750
respectively, and the ICC169-476 transcriptome Illumina sequencing reads can be
found at ArrayExpress under accession number E-MTAB-502.