The E. coli Extended Genome Fernando Baquero Dept. Microbiology, Ramón y Cajal University Hospital, and Laboratory for Microbial Evolution, CAB (INTA-CSIC) Madrid, Spain.
Download ReportTranscript The E. coli Extended Genome Fernando Baquero Dept. Microbiology, Ramón y Cajal University Hospital, and Laboratory for Microbial Evolution, CAB (INTA-CSIC) Madrid, Spain.
The E. coli Extended Genome
Fernando Baquero
Dept. Microbiology, Ramón y Cajal University Hospital, and Laboratory for Microbial Evolution, CAB (INTA-CSIC) Madrid, Spain
The Species E. coli
Roles of the concept of “species”
• Units of taxonomic classification: Units in the general reference system that microbiologists use to order the isolates • Units of generalization: Kinds of microorganisms over which explanatory-predictive generalizations can be made • Units of evolution: Bacterial entities that participate in evolutionary processes and undergo evolutionary change
(Modified from T.A.C. Reydon, Ph.D. Dissertation, Leiden University, 2005)
The Species E. coli
New way • Units of taxonomic classification: Units in the general reference system that microbiologists use to order the isolates • Units of generalization: Kinds of microorganisms over which explanatory-predictive generalizations can be made • Units of evolution: Bacterial entities that participate in evolutionary processes and undergo evolutionary change Classic way
Diversity at all hierarchical levels
Strain Mutation
Some strains are more mutable than others
Population Clonalization
Some populations tend to produce more clones?
Community Speciation
Some bacterial groups tend to produce more species?
At any level, the origin of diversity is probably stochastic
Adaptation Complexity: Mutation
Single adaptive event
Clonalization
Multiple adaptive events
Speciation
Very complex adaptive events
Clonalization
Allopatric clonalization Sympatric clonalization
Host Defenses
Clonalization
Allopatric clonalization ExPEC* Non ExPEC Sympatric clonalization * From James R. “Linneus” Johnson
The elimination of intermediates
Impossibility of being a business man
and
a little meermaid
Species-Environment Concerted Evolution Phylogenetic groups Core genome species evolution Basic reproductive environment environmental evolution
Co-evolution: Trees within Trees
Host Bacteria or bacterial consortium
The clues of E. coli genetic diversity
• • •
Errors in DNA replication and repair
•
Horizontal genetic transfer
from other organisms • Creation of
mosaic genes
from parts of other genes
Duplication
and divergence of pre-existing genes De novo invention of genes from DNA that had previously a non-coding sequence
Modified from Wolfe and Li, Nat. Genet. 33, 2003
Not a single strain represents the whole species
• • • • • • • •
K12-MG1655 (4,289 ORFs) K12-W3110 (4,390 ORFs) O157:H7 (Sakai) (5,361 ORFs) O157:H7-EDL933 (5,349 ORFs) E2348/69 CFT073 (UPEC) (5,379 ORFs) O42 (EAEC), HS, E24377A (ETEC) , Nissle (PBEC) Shigella floxneri SF-301 and 2457T (4,084)
E. coli genomes
1,000 genes of difference!
http://colibase.bham.ac.uk
E. coli genomes
http://colibase.bham.ac.uk
Loops in a common core backbone
A-strain B-strain A-loop (A-island) B-loops (B-islands)
Loops in a common core backbone
296 loops in E. coli Sakai 325 loops in E. coli K12 BB: 3,730 kb 1,393 kb S-loops BB: 3,730 kb K-loops 537 kb
Loop sizes
Chiapello et al., BMC Bioinformatics, 6:171, 2005
Large loops
arise from horizontal transfer events
Small loops
may arise from replication errors (small deletions or insertions), or correspond to highly polymorphic regions
The core backbone is not the minimal genome
• The “core backbone” is not the “minimal
E. coli
genome”, because of
high level of gene redundancy.
• A high number of genes are
members of gene families
(2-30 copies), similar enough to be assigned similar functions (
paralogs
) • Such redundancy involves
20-40 % of the E. coli coding sequences
(more in the largest genomes) • “
In-silico
metabolic phenotype” including all basic functions, predict about
700 genes in minimal genome
(
Blattner at al., Science 1997, Edwards and Palsson, PNAS 2000)
Gogarden et Townsend, Nature Rev. Mic. (2005)
The blue gene, unexpected in the species “C”, might have arisen: i) by horizontal gene transfer; or ii) by an ancient
gene duplication
followed by differential gene loss.
The loops
• The
backbone
evolves by
vertical
transfer.
• Large
loops
are probably acquired by
horizontal
gene transfer, but also evolve by vertical transfer.
PAIs, islets, phages, plasmids, transposable, repetitive elements...
•
Loops
tend to have a
different codon usage and higher AT
% than the backbone.
• Loops tend to contain more frequently
operational genes
(actions) than informative genes (complex regulation)
(R. Jain, 1999)
Random-scale sub-network (loop)
ALIEN
nodes links Operative genes are more easily accepted
Elaboration from Jain et al.
ALIEN
Scale free network (core) Informative genes less easily accepted nodes Number of links (log)
Elaboration from Jain et al.
ALIEN Subnetwork
Scale free network (core) Informative genes less easily accepted except alien replacement of an entire sub-network nodes Number of links (log)
3,256 E. coli genes are connected by 113,894 links Predicted functional modules in E. coli
(von Mering et al., PNAS 100:15428, 2003)
Loops as R&D E. coli laboratories
•
Proteins expressed
(bars in red)
Positions of K-loops
(bars in blue) The genes in the loops express proteins in only 10% of the cases
M. Taoka et al., Mol & Cell. Proteomics (2004)
Acquisition
Gene flux
Excision Modification
Loss
Duplication Modification
(Daubin et al., Genome Biol., 4:R57, 2003; Ochman and Jones, EMBO J., 19:6637, 2000)
More loss in sequences of recent acquisition* Insertions and deletions occur more frequently in loops Overall less loss than acquisition?
Constant Random Gene Influx?
Acquisition
Gene flux
Excision Modification Loss Duplication Modification
As in the case of random mutation, there might be a blind, random uptake and loss of available foreign genetic sequences; environmental selection and random drift determines the fate of these constructions.
E. coli - where alien genes come from?
•
Enterobacteriaceae
(56 %) (
Klebsiella
,
Salmonella, Serratia, Yersinia
);
Aeromonas
,
Xylella
,
Ralstonia, Caulobacter, Agrobacterium
•
Plasmids
(28 %) - about 250 plasmids identified in
E. coli.
•
Phages
specific) (10%) + many ORFan genes (64 MG1655 ( Modified from
Duphraigne et al., NAR 33, 2005, and Daubin&Ochman, Genome Research, 2004)
The E. coli “Gene Exchange Community” should be better identified!
E. coli Recipient Barriers for Horizontal Gene Transfer
• • • • • • • • • • • • •
Ecological separation
from donor
DNA sequence divergence Low numbers Inadequate phage receptors Inadequate pilus specificity
for mating
Contact-killing or inhibition Surface exclusion
* 200 enzymes!
Restriction*
; no anti-restriction mechanisms,
gene inactivation Absence of replication
of foreign gene,
incompatibility Absence of integration
of foreign gene in specific sites
No recombination
with host genome (AT/CG), MMR system
Decrease in fitness
of recipient after DNA acquisition
No more room
for new DNA: Headroom (Maximal Genome?)
Sequence divergence reduces acquisition of foreign DNA If the acquisition produce neutral events the tolerance increases
Modified from Gogarten and Towsend, Nature RM, 2005
Deleterious events are frequent with high divergence, but eventual beneficial events are rare with low divergence rates
Species-Environment Concerted Evolution Phylogenetic groups Core genome species evolution Basic reproductive environment environmental evolution
Genome Size in E. coli strains ECOR Phylogenetic Groups
kb
5,4 5,2 5 4,8 4,6 4,4 4,2 4
K12 level
A B1 B2 D
Data:
Bergthorsson and Ochman, Microb. Biol. Evol. 15:6-16, 1998
Phylogenetic groups: clinical associations
100 40 30 20 10 0 90 80 70 60 50
A B1 B2
Clinical Rectal (FUTI) Cystitis Faecal HV-Fr Febrile UTI Faecal HV-Sp
D
Clinical: Johnson et al., EID 11:141, 2005; Cystitis: Johnson et al., AAC 49:26, 2005; FUTI and rectal FUTI: Johnson et al., JCM 43:3895, 2005; Faecal Fr/Cr/Ma, Duriez et al., Microbiology 147:1671, 2001; Faecal HV Spain, Machado et al., AAC 49, 2005
Phylogenetic groups: clinical associations
But: “Epidemic extraintestinal strains”, many SxT-R in UTI in US, Israel, France (
Johnson et al.,EID 11:141, 2005
) 70 60 50 40 30 20 10 0
A B1 B2
Groups B2 and D are the more frequently found in
E. coli
bacteremia (
Hilali et al., Inf.Imm 68:3983, 2000; Johnson et al., JID15:2121, 2004, Bingen, yesterday)
D
Clinical Rectal (FUTI) Cystitis Faecal HV-Fr Febrile UTI Faecal HV-Sp Clinical:
Johnson et al., EID 11:141, 2005
; Cystitis:
Johnson et al., AAC 49:26, 2005
; FUTI and rectal FUTI: Johnson et al.,
JCM 43:3895, 2005
; Faecal Fr/Cr/Ma,
Duriez et al., Microbiology 147:1671, 2001
; Faecal HV Spain,
Machado et al., AAC 49, 2005
Distribution of E. coli isolates from hospitalized patients and from healthy volunteers among the four phylogenetic groups
30 20 50 40 1 0 0
A B1 B2 D
Machado, Cantón, Baquero et al., AAC 49 (2005)
ESBLs
(red) predominates among strains of
group D Pathogenic
strains,
non ESBL
, predominates among
group B2 Commensal
strains, non ESBL, predominates among
group A
Antimicrobial-R in phylogenetic groups
40 30 20 10 0 80 70 60 50
A SxT-R B1 ESBLs B2 Cipro-R(1) D Cipro-R(2)
SxT-R and Cipro-R(1):
Johnson et al, AAC 49:26, 2005
; ESBL:
Machado et al., AAC 49, 2005;
Cipro-R(2):
Kuntaman et al., EID 11:1363, 2005 (Indonesia).
The phylogenetic group B2, the more pathogenic one, tends to be the less resistant?
Species-Environment Concerted Evolution Ecotypes Core genome species evolution Basic reproductive environment environmental evolution
Models for Multiple Ecotypes
(Gevers et al., Nature MR 3:733, 2005)
Clonalization
Patients with different ESBL clones
Ramón y Cajal Hospital, Madrid (Baquero, Coque & Cantón, Lancet I.D. 2:591, 2002)
30 25 20 15 10 5 0 88 89 90 91 92 93 94 95 96 97 98 99
Ye ar
0
Mutation: Intra-Clonal Diversity
E. coli
:
Faecal Urine Blood ESBLs
80 70 60 50 40 30 20 10 0 Hypo Normo Weak
Mutation frequency
Baquero et al, AAC 2004 and Nov. 2005
Strong
Clonal Ensembles: Metastability through Intermittent Fixation
Different clones peak in frequency at different times, accordingly to the best-fit clone in each epoch* of a changing environment * epochal evolution Line of best fit clones time The maintenance of clonal ensembles is favored by the assymetry of fitness abilities in different clones in different epochs Clonal ensemble
Shared Environments and Maintenance of Diversity
A regional polyclonal community structure
1 2 1 Alternative stable equilibria and the coexistence of variant organisms
On this topic:
Geographic mosaic theory of coevolution, Forde et al, Nature, 2004
Maintenance of diversity
A regional polyclonal community structure
1 2 1 Local Migration Local Gene Flow
Diversity: Collapse and Resurrection
Kin effects in open systems SELECTION
Maintenance of diversity
A regional polyclonal community structure
1
Environmental gradients are composed by a multiplicity of patches that may act as discrete selective points for bacterial variants
Maintenance of diversity
A regional polyclonal community structure Gradients and concentration dependent selection
(F. Baquero and C. Negri, Bioessays, 1997)
Maintenance of Diversity by Scissors, Rock, Paper Model
B. Kerr et al., Local dispersal promotes biodiversity in a real life game of rock-paper-scissors. Nature 418:171, 2002
Rock, Paper, Scissors Model
2. Scissors increase its power against paper...
3. And less paper means more stones...
1. If the stones reduces its attack again scissors....
Rock, Paper, Scissors Model
B. Kerr et al., Local dispersal promotes biodiversity in a real life game of rock-paper-scissors. Nature 418:171, 2002
Rock, Paper, Scissors Model
B. Kerr et al., Local dispersal promotes biodiversity in a real life game of rock-paper-scissors. Nature 418:171, 2002
In60-like integrons Kindly provided by Teresa Coque et al., 2005
Int1 aacA4 aadA2 qacE
D
1sul1 orf513 catA2 qacE
D
1 sul1
orf5
Int1 aadB qacE
D
1sul1 orf513 dfrA10 qacE
D
1 sul1
orf5
Int1 aadA2 qacE
D
1sul1 orf513 ampC ampR qacE
D
1 sul1
orf5
Int1 dfrA16 aadA2 2
CTX-M-9
qacE
D
1sul1 orf513 bla CTXM-9
orf3-like IS
3000 Int1 aacA4 bla OXA-2
CTX-M-2
orfD
qacE
D
1sul1 orf513 bla CTXM-2
orf3::
qacE
D
1 sul1 qacE
D
1 sul1 Int1 dfrA16 aadA2 2 qacE
D
1sul1 orf513 qnr ampR qacE
D
1 sul1 orf5 orf6
IS
6100 Int1 aac(6) bla oxA30 catB3 aar-3 qacE
D
1sul1 orf513 qnr ampR qacE
D
1 sul1 orf5 orf6
IS
6100 qacE
D
1sul1 orf513 dfrA18 int1 oxa1 aadA1 qacE
D
1 sul1 qacE
D
1sul1 orf513 bla DHA ampR qacE
D
1 sul1 qacE
D
1sul1 orf513
orf1
bla DHA ampR qacE
D
1 sul1
Extensive “McFarlane-Burnett” Model and Evolution of Bacterial Pathogenicity
•
Every evolutionary element
(clones, chromosomal sequences, plasmids, transposons, islands, recombinases, insertion sequences...) is independently submitted to apparently
random spontaneous variation
.
•
Combinations of the variant elements
are
constantly constructed
apparently
at random.
• Eventually
a given combination is selected
and enriched by an unexpected
advantage
(colonization-
pathogenicity
) or fixed by drift.
Pre-pathogens are probably constantly constructed; many of them eliminated by immunity and normal microbiota
The opportunity of meeting interesting people: E. coli in the environment
• It has been suggested that one-half of
E. coli
population resides
in primary habitats (warm blooded hosts) and
one-half in soil or water
.
•
Tropical waters
harbor natural populations of
E. coli (Carrillo et al., AEM 50:468, 1985)
• In
nutrient-rich soils
, particularly with cyclic periods of wet and dry weather,
E. coli is member of normal microflora
(Winfield and Groisman, AEM 69:3687, 2003)
E. coli in the environment
• Land disposal practices of
sewage
and
sewage sludges
that result from wastewater treatment.
• More than 3 million gallons of
sewage effluent
from more than 3,000 land treatment sites and 15 million septic tanks were applied to land every day in 1984
(Keswick, BH. 1984)
• More than
7 million dry tons
of
sewage sludge
are produced anually and 54 % of this is applied to
soil
(
Environmental Protection Agency, http:// www.epa.gov./oigearth; 2002; Santamaría&Toranzos, Int.Microbiol. 6:5-9, 2003)
E. coli in the environment
•
EPA Class A Biosolids
Less than 10 3 thermotolerant coliforms/g, for lawns, home gardens, as commercial fertilizer.
•
EPA Class B Biosolids
Less than 10 6 thermotolerant coliforms/g, for land application, forest lands, reclamation sites. During a period, access is limited to public and livestock.
(Environmental Protection Agency)
Temperature fitness profiles
Absolute fitness 5 0 -5 -10 -15 -20
E. coli K. pneumoniae
10 20 30 40 50 10 20 30 40 50
Temperature (ºC)
Modified from:
Okada and Gordon, Mol. Ecol. 10:2499, 2001
CTX-M-10 linked to Kluyvera and phage sequences
Tn
1000
-like Transposase (fragment) ORF2 ORF3 ORF4 DNA invertase CTX-M-10 ORF7 ORF8 Transposase IS432 ORF10 ORF11 Transposase IS5
Eco Bam
RI HI
Bam
HI
Eco
RI
Bam
HI
Eco
RI
Eco
RI Invertible region Tn
5708
IS4321 fragment
K. cryocrescens
homol. region (90%)
IS5
Phage related region
Oliver, Coque, Alonso, Valverde, Baquero, Cantón. AAC 2005; 1567-1571
Present in different clones at Ramón y Cajal Hospital Variability in the sequence among different clones Probably linked to the same plasmid structure
The Extended Genome
A genetic space
composed by the sum of: • The sequences corresponding to the maximal
core genome
of all clones (ortologs-paralogs) , plus • The sequences of
all loops
that have been inserted in such a core in the different
natural
(successful at one time) clones or lineages: ecotypes, geotypes, pathotypes.., plus • The sequences of all
extra-chromosomal elements
stably associated with any clone
Extended Genome: a Genetic Space
Core Loops Peripheral
Extended Genome: Core Gravity
Foreign sequences of different base composition tends to “ameliorate” to resemble the features of the resident genome* Core Loops Peripheral
*Ochman and Jones, EMBO J., 19:6637, 2000
Extended Genome: a Genetic Space
Filling the Carrying Capacity of the Environment for the Species
Genetic Space
Complex Genetic Space
The Extended E. coli Genome
• Research to increase our
interpretative
,
predictive
and
preventive
capability about
Escherichia coli
evolutionary biology.
• Catalog of sequences of
all evolutionary relevant pieces*
in
E. coli
.
• • Network of
all interactions
between pieces.
Modelization
of combinations that might emerge under particular environmental or clinical conditions.
*F.Baquero, From Pieces to Patterns, Nature Reviews 2004
A lot of work, a lot of fun.
Particular thanks to some of my friends in the lab...
• Rafael Cantón • Teresa Coque • Juan-Carlos Galán • José-Luis Martínez (CNB, CSIC)
Gerdes SY et al, JB 2003